I am interested to search in sentence level. It is a parallel corpora , each sentence in the first language is equivalence to sentence in the second language. I want to index each sentence and have some id for each sentence in order when I retrieve it I go easily and retrieve its equivalence in the second language.
This I did by splitting the file and consider each sentence as text file. However, this really takes long time to do for many huge text files. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-one-huge-text-file-tp3191605p3191628.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org