I have not move forward on it, but yes Mark, want to use it.
I have seen one of your examples.
But have not figured out the proper format of the files. Here' s what I
think from what I have been reading. Tell me if I am write.
>From class DefaultModelBuilderUtil method generateModel
@param sentences a file that contains one sentence per line.
* There should be at least 15K sentences
* consisting of a representative sample
from
* user data
This seems to be a text file where each sentence is on one line.
I wonder if it has to be annotated, for instance:
<START:person> Archimedes <END> used the method of exhaustion to
approximate the value of π.Archimedes ( 287&ndash ;212 BC ) was the first
to estimate π rigorously .
Or just:
Archimedes used the method of exhaustion to approximate the value of
π.Archimedes ( 287&ndash ;212 BC ) was the first to estimate π rigorously .
@param knownEntities a file consisting of a simple list of
* unambiguous entities, one entry per
line.
* For instance, if one was trying to
build a
* person NER model then this file would
be a
* list of person names that are
unambiguous
* and are known to exist in the sentences
This would be a text file list?
Something like one name per line?
Archimedes
Socrates
....
* @param knownEntitiesBlacklist This file contains a list of known bad
hits
* that the NER phase of this processing
might
* catch early one before the model
iterates
* to maturity
Same as the knownEntities but a list of what NOT to mark as an entity?
The rest seemed quite straight forward.
Thanks,
Carlos.
On Mon, May 19, 2014 at 5:34 PM, Mark G <[email protected]> wrote:
> No problem, Carlos are you using the model builder add on ?
>
>
> Mg
>
> > On May 19, 2014, at 6:29 PM, Carlos Scheidecker <[email protected]>
> wrote:
> >
> > Thanks mate! Saw you updated the code. Cheers.
> >
> >
> >> On Mon, May 19, 2014 at 3:24 PM, Mark G <[email protected]> wrote:
> >>
> >> OK, thanks Carlos, I think I will commit the change, seems like it
> wouldn't
> >> hurt. Anybody else?
> >>
> >>
> >> On Mon, May 19, 2014 at 5:07 PM, Carlos Scheidecker <
> [email protected]
> >>> wrote:
> >>
> >>> I am having the same issue Mark.
> >>>
> >>> The class is not public so it has no visibility
> >>> inside opennlp.addons.modelbuilder.impls.GenericModelableImpl therefore
> >> it
> >>> cannot be built with Maven or resolved inside Eclipse.
> >>>
> >>> I have also been looking at new commits to fix that and there were
> none.
> >>>
> >>>
> >>>> On Mon, May 12, 2014 at 1:03 PM, Mark G <[email protected]> wrote:
> >>>>
> >>>> Does MarkableFileInputStreamFactory need to be package private? I am
> >>> using
> >>>> it in an addon (modelbuilder-addon), I would like to either move it or
> >>> make
> >>>> it a public class. Perhaps I should be using a different class
> >>> altogether?
> >>>>
> >>>> I am using it like this
> >>>>
> >>>> ObjectStream<String> lineStream =
> >>>> new PlainTextByLineStream(new
> >>>> MarkableFileInputStreamFactory(params.getAnnotatedTrainingDataFile()),
> >>>> charset);
> >>>> ObjectStream<NameSample> sampleStream = new
> >>>> NameSampleDataStream(lineStream);
> >>>>
> >>>> where getAnnotatedTrainingDataFile returns a java File object.
> >>>>
> >>>> thanks
> >>
>