Hi,

I need to index a Wikipedia dump.  I know there is code in contrib/benchmark 
for indexing *English* Wikipedia for benchmarking purposes.  However, I'd like 
to index a non-English dump, and I actually don't need it for benchmarking, I 
just want to end up with a Lucene index.

Any suggestions where I should start?  That is, can anything in 
contrib/benchmark already do this, or is there anything there that I should use 
as a starting point?  As opposed to writing my own Wikipedia XML dump 
parser+indexer.

Thanks,
Otis



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to