That is correct , sentence file does not need annotations, and the other files 
Are one name per line. 
It uses the names file to annotate the sentences, and won't annotate anything 
that's in the blacklist file.

Let me know how it goes!



> On May 20, 2014, at 4:16 AM, Carlos Scheidecker <[email protected]> wrote:
> 
> I have not move forward on it, but yes Mark, want to use it.
> 
> I have seen one of your examples.
> 
> But have not figured out the proper format of the files. Here' s what I
> think from what I have been reading. Tell me if I am write.
> 
> From class DefaultModelBuilderUtil method generateModel
> 
> @param sentences        a file that contains one sentence per line.
>    *                                 There should be at least 15K sentences
>    *                                 consisting of a representative sample
> from
>    *                                 user data
> 
> This seems to be a text file where each sentence is on one line.
> I wonder if it has to be annotated, for instance:
> 
> <START:person> Archimedes <END> used the method of exhaustion to
> approximate the value of π.Archimedes ( 287&ndash ;212 BC ) was the first
> to estimate π rigorously .
> 
> Or just:
> 
> Archimedes used the method of exhaustion to approximate the value of
> π.Archimedes ( 287&ndash ;212 BC ) was the first to estimate π rigorously .
> 
> 
> @param knownEntities            a file consisting of a simple list of
>   *                                 unambiguous entities, one entry per
> line.
>   *                                 For instance, if one was trying to
> build a
>   *                                 person NER model then this file would
> be a
>   *                                 list of person names that are
> unambiguous
>   *                                 and are known to exist in the sentences
> 
> This would be a text file list?
> 
> Something like one name per line?
> 
> Archimedes
> Socrates
> ....
> 
> 
> * @param knownEntitiesBlacklist   This file contains a list of known bad
> hits
>   *                                 that the NER phase of this processing
> might
>   *                                 catch early one before the model
> iterates
>   *                                 to maturity
> 
> Same as the knownEntities but a list of what NOT to mark as an entity?
> 
> 
> The rest seemed quite straight forward.
> 
> Thanks,
> 
> Carlos.
> 
> 
> 
> 
>> On Mon, May 19, 2014 at 5:34 PM, Mark G <[email protected]> wrote:
>> 
>> No problem, Carlos are you using the model builder add on ?
>> 
>> 
>> Mg
>> 
>>>> On May 19, 2014, at 6:29 PM, Carlos Scheidecker <[email protected]>
>>> wrote:
>>> 
>>> Thanks mate! Saw you updated the code. Cheers.
>>> 
>>> 
>>>> On Mon, May 19, 2014 at 3:24 PM, Mark G <[email protected]> wrote:
>>>> 
>>>> OK, thanks Carlos, I think I will commit the change, seems like it
>> wouldn't
>>>> hurt. Anybody else?
>>>> 
>>>> 
>>>> On Mon, May 19, 2014 at 5:07 PM, Carlos Scheidecker <
>> [email protected]
>>>>> wrote:
>>>> 
>>>>> I am having the same issue Mark.
>>>>> 
>>>>> The class is not public so it has no visibility
>>>>> inside opennlp.addons.modelbuilder.impls.GenericModelableImpl therefore
>>>> it
>>>>> cannot be built with Maven or resolved inside Eclipse.
>>>>> 
>>>>> I have also been looking at new commits to fix that and there were
>> none.
>>>>> 
>>>>> 
>>>>>> On Mon, May 12, 2014 at 1:03 PM, Mark G <[email protected]> wrote:
>>>>>> 
>>>>>> Does MarkableFileInputStreamFactory need to be package private? I am
>>>>> using
>>>>>> it in an addon (modelbuilder-addon), I would like to either move it or
>>>>> make
>>>>>> it a public class. Perhaps I should be using a different class
>>>>> altogether?
>>>>>> 
>>>>>> I am using it like this
>>>>>> 
>>>>>>    ObjectStream<String> lineStream =
>>>>>>             new PlainTextByLineStream(new
>>>>>> MarkableFileInputStreamFactory(params.getAnnotatedTrainingDataFile()),
>>>>>> charset);
>>>>>>     ObjectStream<NameSample> sampleStream = new
>>>>>> NameSampleDataStream(lineStream);
>>>>>> 
>>>>>> where getAnnotatedTrainingDataFile returns a java File object.
>>>>>> 
>>>>>> thanks
>> 

Reply via email to