Scott Green wrote:
Hi list,

Firstly, i don't know whether nutch-dev mail list is suitable for this
topic or not. If I post in the wrong place, pls tell me where should I
ask this question. Thanks.

The question is how to index resource in real time in nutch? This
question is raised from GMail. I don't know what exactly behind GMail,
but it should be built on GFS. When I get one email or send one email
out,  push the "Search Mail" immediately and it always get it. I'll
appreciate if someone will to explain how GMail works.

And any advice to hack Nutch/Hadoop to archive this? Thanks

hi,
Most of the projects in google uses a scalable data structure called bigtable. Orkut, google earth, finance and writley is reported to use this. And i suppose Gmail also uses bigtable. Bigtable is build upon GFS and desined to scale at petabyte lavel, but they work to icrease it to the next level.

As far as i know, you should rebuild the index every time or merge the indexes, so there is not an online index building. Consider asking this to lucene mailing list.

Reply via email to