I am guessing that the idea behind not putting the indexes in HDFS is (1) maximize performance; (2) they are relatively transient - meaning the data they are created from could be in HDFS, but the indexes themselves are just local. To avoid having to recreate them, a backup copy could be kept in HDFS.

Since a goal is to be able to update them (frequently), this seems like a good approach to me.

Tim


Andrzej Bialecki wrote:
Doug Cutting wrote:
My primary difference with your proposal is that I would like to support online indexing. Documents could be inserted and removed directly, and shards would synchronize changes amongst replicas, with an "eventual consistency" model. Indexes would not be stored in HDFS, but directly on the local disk of each node. Hadoop would perhaps not play a role. In many ways this would resemble CouchDB, but with explicit support for sharding and failover from the outset.

It's true that searching over HDFS is slow - but I'd hate to lose all other HDFS benefits and have to start from scratch ... I wonder what would be the performance of FsDirectory over an HDFS index that is "pinned" to a local disk, i.e. a full local replica is available, with block size of each index file equal to the file size.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to