: >From this URL : http://www.mail-archive.com/[EMAIL PROTECTED]/msg00998.html : I see that Hadoop is not suitable for incremental updates if the inverted : files is based on it, and what's more, Nutch has adopted Hadoop, that means : the incremental updates ability provided by Lucene will not work in Nutch.
Questions about Nutch's use of Lucene and HDFS are best addressed on a nutch specific list. : Also, It is suggested to experiment with HBase, which is a BigTable based on : GFS. Since HBase is based Hadoop, then what is a difference if using HBase : for incremental indexing? Thanks a lot for attentions. While i'm not an expert on HBase from the single message you linked to i see the comment: "...HBase is designed to be a much more scalable, incrementally updateable DB than BDB or relational DBs..." which suggests to me that while HBase may be built on HDFS, the API abstraction may allow for "Files" which do allow for incremental updates. -Hoss
