I have not move forward on it, but yes Mark, want to use it.

I have seen one of your examples.

But have not figured out the proper format of the files. Here' s what I
think from what I have been reading. Tell me if I am write.

>From class DefaultModelBuilderUtil method generateModel

@param sentences        a file that contains one sentence per line.
    *                                 There should be at least 15K sentences
    *                                 consisting of a representative sample
from
    *                                 user data

This seems to be a text file where each sentence is on one line.
I wonder if it has to be annotated, for instance:

<START:person> Archimedes <END> used the method of exhaustion to
approximate the value of π.Archimedes ( 287&ndash ;212 BC ) was the first
to estimate π rigorously .

Or just:

Archimedes used the method of exhaustion to approximate the value of
π.Archimedes ( 287&ndash ;212 BC ) was the first to estimate π rigorously .


@param knownEntities            a file consisting of a simple list of
   *                                 unambiguous entities, one entry per
line.
   *                                 For instance, if one was trying to
build a
   *                                 person NER model then this file would
be a
   *                                 list of person names that are
unambiguous
   *                                 and are known to exist in the sentences

This would be a text file list?

Something like one name per line?

Archimedes
Socrates
....


* @param knownEntitiesBlacklist   This file contains a list of known bad
hits
   *                                 that the NER phase of this processing
might
   *                                 catch early one before the model
iterates
   *                                 to maturity

Same as the knownEntities but a list of what NOT to mark as an entity?


The rest seemed quite straight forward.

Thanks,

Carlos.




On Mon, May 19, 2014 at 5:34 PM, Mark G <[email protected]> wrote:

> No problem, Carlos are you using the model builder add on ?
>
>
> Mg
>
> > On May 19, 2014, at 6:29 PM, Carlos Scheidecker <[email protected]>
> wrote:
> >
> > Thanks mate! Saw you updated the code. Cheers.
> >
> >
> >> On Mon, May 19, 2014 at 3:24 PM, Mark G <[email protected]> wrote:
> >>
> >> OK, thanks Carlos, I think I will commit the change, seems like it
> wouldn't
> >> hurt. Anybody else?
> >>
> >>
> >> On Mon, May 19, 2014 at 5:07 PM, Carlos Scheidecker <
> [email protected]
> >>> wrote:
> >>
> >>> I am having the same issue Mark.
> >>>
> >>> The class is not public so it has no visibility
> >>> inside opennlp.addons.modelbuilder.impls.GenericModelableImpl therefore
> >> it
> >>> cannot be built with Maven or resolved inside Eclipse.
> >>>
> >>> I have also been looking at new commits to fix that and there were
> none.
> >>>
> >>>
> >>>> On Mon, May 12, 2014 at 1:03 PM, Mark G <[email protected]> wrote:
> >>>>
> >>>> Does MarkableFileInputStreamFactory need to be package private? I am
> >>> using
> >>>> it in an addon (modelbuilder-addon), I would like to either move it or
> >>> make
> >>>> it a public class. Perhaps I should be using a different class
> >>> altogether?
> >>>>
> >>>> I am using it like this
> >>>>
> >>>>     ObjectStream<String> lineStream =
> >>>>              new PlainTextByLineStream(new
> >>>> MarkableFileInputStreamFactory(params.getAnnotatedTrainingDataFile()),
> >>>> charset);
> >>>>      ObjectStream<NameSample> sampleStream = new
> >>>> NameSampleDataStream(lineStream);
> >>>>
> >>>> where getAnnotatedTrainingDataFile returns a java File object.
> >>>>
> >>>> thanks
> >>
>

Reply via email to