Re: Multi get/put

stack Wed, 06 Aug 2008 13:43:46 -0700

Ning Li wrote:

Does you have to do a rewrite of the lucene index at compaction time?  Or
just call optimize?  (I suppose its the former if you need to clean up
'References' as per below where you talk of splits)

What do you mean by "a rewrite of the lucene index"?

In hbase, on split, daughters hold a reference to either the top orbottom half of their parent region. References are undone bycompactions; as part of compaction, the part of the parent referenced bythe daughter gets written out to store files under the daughter.Daughters try to undo references as promptly as possible because regionswith references are not splitable (references to references, and so on,would soon become unmanageble).

In your description, you mentioned that daughter regions reference theirparents' index. When I said, 'a rewrite of the lucene index', I wasasking, as per hbase regions, if you followed the model and wrote a newlucene index comprised of daughter-only content at compaction time. Ordo you just 'optimize' and let the references build up so the daughterof a daughter points all the ways up to the parent?


Just wondering.

Regards your 'on the other hand' above, thats a good point.  Have you
verified that if a regionerver is running on a datanode, that the lucene
index is written local?  Would be interesting to know.


That's HDFS's policy. See HDFS's FSNamesystem.getAdditionalBlock.

Sorry.  Yeah, of course.

So, why do you think it so slow going via HDFS FileSystem when the datais local? Is it the block-orientated access or is there just a high-taxgoing via the HDFS FS interface?


St.Ack

Re: Multi get/put

Reply via email to