Hi benglish,

I see your point. You haven't got an index yet. All you have to do first is
create a Lucene index. When creating it, don't worry about training.

To create an index, please take a look at this:

http://lucene.apache.org/core/4_7_2/core/overview-summary.html#overview_description

Once you've got an index, then you can call train() (at the out of the loop, of 
course).

Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html


(2014/06/22 23:14), benglish wrote:
Dear Koji,
Since I am newbie to Lucene, I still have no opinion about the .xml file you
have talked about in your post unfortunately!!!
Let's imagine I have 5 categories named {A, B, C, D, E} and 100 files named
from 1 to 100. It is impossible in my case to train the classifier out of a
loop, because I should extract the content of each file and its category and
then add it to the training set. So it must be in a loop. Could you please
tell me if I am right with the following pseudocode:

directory = directory of training files
trainingNumber = number of training files
for(int i = 0; i < trainingNumber; i++)
{
     String category = category of ith file
     String text = content of ith file
     classifier.train(ar, text, category, new
SomeAnalyzer(Version.LUCENE_46));
}

If it is wrong, please let me know how I should train the classifier outside
the loop

Yours Sincerely,
benglish



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Train-Lucene-with-topic-defined-files-tp4141979p4143318.html
Sent from the Lucene - General mailing list archive at Nabble.com.




Reply via email to