Indexing Wikipedia dumps

Otis Gospodnetic Tue, 11 Dec 2007 21:35:39 -0800

Hi,

I need to index a Wikipedia dump.  I know there is code in contrib/benchmark 
for indexing *English* Wikipedia for benchmarking purposes.  However, I'd like 
to index a non-English dump, and I actually don't need it for benchmarking, I 
just want to end up with a Lucene index.


Any suggestions where I should start?  That is, can anything in 
contrib/benchmark already do this, or is there anything there that I should use 
as a starting point?  As opposed to writing my own Wikipedia XML dump 
parser+indexer.

Thanks,
Otis



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing Wikipedia dumps

Reply via email to