Re: Distributed indexing

Samuel Guo Mon, 28 Apr 2008 08:06:36 -0700

map/reduce will be a suitable approach for indexing large doccollections. but I don't know is it suitable for retrieval. you can see*Nutch* for the distributed searching.

under the hadoop/contrib directory , there is a *Index* package. It maybe helpful :)


Matt Wood 写道:

Hello all,
I was wondering if someone in the know could tell me about the currentstate of play with building and searching large indices with hadoop?
Some background: I work on the human genome project, and we'recurrently setting up a new facility based around the next generationof DNA sequencing. We're currently producing around 50Tb of data aweek, some of which we would like to provide fast access to via an index.
Having read up on hadoop, it appears that it could play a central partin our infrastructure, and that others have tried (and succeeded) inbuilding a distributed indexing and retrieval system with hadoop. I'dbe interested if anyone could point me in the right direction to moreinformation or examples of such a system. Yahoo! (with webmap) seemsto be close to the sort of thing we would need.
Would map/reduce be a suitable approach for indexing _and_ retrieval,or just indexing? Would Solr/Lucene be a good fit? Any help orpointers to more information would be much appreciated!
If you would like any more details, I'd be more than happy to supplythem!
Many thanks,

~ Matt


-------------

Matt Wood
Sequencing Informatics // Production Software
www.sanger.ac.uk

Re: Distributed indexing

Reply via email to