Re: Document Categorizer - Classifying: Help

Jörn Kottmann Fri, 30 Mar 2012 07:27:05 -0700

On 03/30/2012 04:24 PM, Adriano Santos wrote:

In first time, I used this file:


GMDecrease Major acquisitions that have a lower gross margin than the
existing network also
GMIncrease The upward movement of gross margin resulted from amounts
pursuant to adjustments

Second, this:

GMDecrease Major acquisitions that have a lower gross margin than the
existing network also \
            had a negative impact on the overall gross margin, but it
should improve following \
            the implementation of its integration strategies .
GMIncrease The upward movement of gross margin resulted from amounts
pursuant to adjustments \
            to obligations towards dealers .

as documentation sample.

where, GMDecrease and GMIncrease are class. Ok?


I saw that I must use more document in training, correct? So, how can I
represent many document in one class? This way:

GMDecrease Major acquisitions that have a lower gross margin than the
existing network also
GMDecrease To perform classification you will need a maxent model - these
are encapsulated in the DoccatModel class of OpenNLP tool
GMDecrease First you need to grab the bytes from the serialized model on an
InputStream - we'll leave it you to do that, since you were the one who
serialized it to begin with. Now for the easy part
GMDecrease The Document Categorizer can be trained on annotated training
material. The data must be in OpenNLP Document Categorizer training format.
...

GMIncrease The upward movement of gross margin resulted from amounts
pursuant to adjustments
GMIncrease The tags array contains one part-of-speech tag for each token in
the input array
GMIncrease Looks like the mailing list sever removed your attachment.
Anyway, the output indicates
...


Yes, looks good. Format is class label + document in one line.
The document is whitespace tokenized.

Jörn

Re: Document Categorizer - Classifying: Help

Reply via email to