I am guessing that the idea behind not putting the indexes in HDFS is
(1) maximize performance; (2) they are relatively transient - meaning
the data they are created from could be in HDFS, but the indexes
themselves are just local. To avoid having to recreate them, a backup
copy could be kept in HDFS.
Since a goal is to be able to update them (frequently), this seems
like a good approach to me.
Tim
Andrzej Bialecki wrote:
Doug Cutting wrote:
My primary difference with your proposal is that I would like to
support online indexing. Documents could be inserted and removed
directly, and shards would synchronize changes amongst replicas,
with an "eventual consistency" model. Indexes would not be stored
in HDFS, but directly on the local disk of each node. Hadoop would
perhaps not play a role. In many ways this would resemble CouchDB,
but with explicit support for sharding and failover from the outset.
It's true that searching over HDFS is slow - but I'd hate to lose
all other HDFS benefits and have to start from scratch ... I wonder
what would be the performance of FsDirectory over an HDFS index that
is "pinned" to a local disk, i.e. a full local replica is available,
with block size of each index file equal to the file size.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]