Re: Multi get/put

Marcus Herou Sat, 09 Aug 2008 04:45:34 -0700

This is something I would like to implement as well. A connection pool of
some sort to increase the open/close performance and to be able to hold a
connection "open" during a session or at least a transaction (more than one
put in a row) which I guess is supported in trunk ?



//Marcus

On Thu, Aug 7, 2008 at 2:15 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> In terms of performance, the biggest overhead comes from Hbase/Hadoop ipc.
> For simple queries, a search through ipc takes 3-4 times as long as that
> directly on HDFS. I guess a lot of the overhead is because of java
> reflection in ipc proxy. Does Hbase have plans to make ipc more efficient?
>
> HDFS adds another layer of overhead compared with local file system. A
> search on HDFS (on a node that has a local copy of all data) can take 10
> times as long as that on local file system. We suspect most overhead comes
> from reopening sockets in HDFS client.
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>
> [EMAIL PROTECTED]
> (408)927-1886 (phone)
> (408)927-3215 (fax)
>
>
>
>
>             stack
>             <[EMAIL PROTECTED]
>             >                                                          To
>                                       [email protected]
>             08/06/2008 01:42                                           cc
>             PM
>                                                                   Subject
>                                       Re: Multi get/put
>             Please respond to
>             [EMAIL PROTECTED]
>                .apache.org
>
>
>
>
>
>
>
> Ning Li wrote:
> >> Does you have to do a rewrite of the lucene index at compaction time?
> Or
> >> just call optimize?  (I suppose its the former if you need to clean up
> >> 'References' as per below where you talk of splits)
> >>
> >
> > What do you mean by "a rewrite of the lucene index"?
>
> In hbase, on split, daughters hold a reference to either the top or
> bottom half of their parent region.  References are undone by
> compactions; as part of compaction, the part of the parent referenced by
> the daughter gets written out to store files under the daughter.
> Daughters try to undo references as promptly as possible because regions
> with references are not splitable (references to references, and so on,
> would soon become unmanageble).
>
> In your description, you mentioned that daughter regions reference their
> parents' index.  When I said, 'a rewrite of the lucene index', I was
> asking, as per hbase regions, if you followed the model and wrote a new
> lucene index comprised of daughter-only content at compaction time.  Or
> do you just 'optimize' and let the references build up so the daughter
> of a daughter points all the ways up to the parent?
>
> Just wondering.
>
>
> >> Regards your 'on the other hand' above, thats a good point.  Have you
> >> verified that if a regionerver is running on a datanode, that the lucene
> >> index is written local?  Would be interesting to know.
> >>
> >
> > That's HDFS's policy. See HDFS's FSNamesystem.getAdditionalBlock.
> >
> Sorry.  Yeah, of course.
>
> So, why do you think it so slow going via HDFS FileSystem when the data
> is local?  Is it the block-orientated access or is there just a high-tax
> going via the HDFS FS interface?
>
> St.Ack
>
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Multi get/put

Reply via email to