Hi benglish, I see your point. You haven't got an index yet. All you have to do first is create a Lucene index. When creating it, don't worry about training.
To create an index, please take a look at this: http://lucene.apache.org/core/4_7_2/core/overview-summary.html#overview_description Once you've got an index, then you can call train() (at the out of the loop, of course). Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/06/22 23:14), benglish wrote:
Dear Koji, Since I am newbie to Lucene, I still have no opinion about the .xml file you have talked about in your post unfortunately!!! Let's imagine I have 5 categories named {A, B, C, D, E} and 100 files named from 1 to 100. It is impossible in my case to train the classifier out of a loop, because I should extract the content of each file and its category and then add it to the training set. So it must be in a loop. Could you please tell me if I am right with the following pseudocode: directory = directory of training files trainingNumber = number of training files for(int i = 0; i < trainingNumber; i++) { String category = category of ith file String text = content of ith file classifier.train(ar, text, category, new SomeAnalyzer(Version.LUCENE_46)); } If it is wrong, please let me know how I should train the classifier outside the loop Yours Sincerely, benglish -- View this message in context: http://lucene.472066.n3.nabble.com/Train-Lucene-with-topic-defined-files-tp4141979p4143318.html Sent from the Lucene - General mailing list archive at Nabble.com.