That is correct , sentence file does not need annotations, and the other files 
Are one name per line. 
It uses the names file to annotate the sentences, and won't annotate anything 
that's in the blacklist file.



Let me know how it goes!

Sent from my iPhone

> On May 20, 2014, at 6:08 AM, Carlos Scheidecker <nando....@gmail.com> wrote:
> 
> Hello all,
> 
> I am putting this question on its own thread not to get lost.
> 
> Question is about the proper usage of DefaultModelBuilderUtil.
> 
> I have not figured out the proper format of the files. Here' s what I think
> from what I have been reading. Tell me if I am write.
> 
> From class DefaultModelBuilderUtil method generateModel
> 
> @param sentences        a file that contains one sentence per line.
>    *                                 There should be at least 15K sentences
>    *                                 consisting of a representative sample
> from
>    *                                 user data
> 
> This seems to be a text file where each sentence is on one line.
> I wonder if it has to be annotated, for instance:
> 
> <START:person> Archimedes <END> used the method of exhaustion to
> approximate the value of π.Archimedes ( 287&ndash ;212 BC ) was the first
> to estimate π rigorously .
> 
> Or just:
> 
> Archimedes used the method of exhaustion to approximate the value of
> π.Archimedes ( 287&ndash ;212 BC ) was the first to estimate π rigorously .
> 
> 
> @param knownEntities            a file consisting of a simple list of
>   *                                 unambiguous entities, one entry per
> line.
>   *                                 For instance, if one was trying to
> build a
>   *                                 person NER model then this file would
> be a
>   *                                 list of person names that are
> unambiguous
>   *                                 and are known to exist in the sentences
> 
> This would be a text file list?
> 
> Something like one name per line?
> 
> Archimedes
> Socrates
> ....
> 
> 
> * @param knownEntitiesBlacklist   This file contains a list of known bad
> hits
>   *                                 that the NER phase of this processing
> might
>   *                                 catch early one before the model
> iterates
>   *                                 to maturity
> 
> Same as the knownEntities but a list of what NOT to mark as an entity?
> 
> 
> The rest seemed quite straight forward.
> 
> Thanks,

Reply via email to