On 03/30/2012 04:24 PM, Adriano Santos wrote:
In first time, I used this file:GMDecrease Major acquisitions that have a lower gross margin than the existing network also GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments Second, this: GMDecrease Major acquisitions that have a lower gross margin than the existing network also \ had a negative impact on the overall gross margin, but it should improve following \ the implementation of its integration strategies . GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments \ to obligations towards dealers . as documentation sample. where, GMDecrease and GMIncrease are class. Ok? I saw that I must use more document in training, correct? So, how can I represent many document in one class? This way: GMDecrease Major acquisitions that have a lower gross margin than the existing network also GMDecrease To perform classification you will need a maxent model - these are encapsulated in the DoccatModel class of OpenNLP tool GMDecrease First you need to grab the bytes from the serialized model on an InputStream - we'll leave it you to do that, since you were the one who serialized it to begin with. Now for the easy part GMDecrease The Document Categorizer can be trained on annotated training material. The data must be in OpenNLP Document Categorizer training format. ... GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments GMIncrease The tags array contains one part-of-speech tag for each token in the input array GMIncrease Looks like the mailing list sever removed your attachment. Anyway, the output indicates ...
Yes, looks good. Format is class label + document in one line. The document is whitespace tokenized. Jörn
