RE: endpoint coprocessor performance

Anoop Sam John Tue, 05 Mar 2013 19:15:05 -0800

Yes agree with Andrew here... I checked the 94 code base yday.  I also feel 
that the efficiency should be on the higher side.. And there is no whole table 
scan. The HBase client issues scan for only those regions which come under the 
start/stop keys that app specified. Yes it is contacting .META. to know the 
regions coming within the start/stop rows. But that should not be a big 
efficiency issue IMHO also.

@Kim - Can you do some profiling and let us know which area of code is eating 
up time in your case?

HBASE-6877 also I am seeing.

-Anoop-
________________________________________
From: Andrew Purtell [apurt...@apache.org]
Sent: Wednesday, March 06, 2013 7:28 AM
To: user@hbase.apache.org
Subject: Re: endpoint coprocessor performance

> In current logic, HTable#coprocessorExec always scan the whole table, its
efficiency is low

No, I don't think that is correct.

In its current logic, coprocessorExec always scans the META table for all
regions of the target table, to find the up to date locations, and then
dispatches the exec in parallel to all regions of the target table. The
efficiency of the exec is actually high because invocations happen in
parallel across the cluster, with results reassembled back at the client as
they come in.

The increased setup latency relative to a Scan and the load on META is
because of the initial scan on META to find the up to date locations of all
regions of the target table. For a Scan, the cached locations of regions
are used, and relocations are handled transparently by the client. Exec
could be updated to do this as well.

On Wed, Mar 6, 2013 at 5:13 AM, Kim Hamilton <kimdhamil...@gmail.com> wrote:

> Thanks so much! This describes exactly what I'm seeing. I did notice
> extremely heavy load on the region server carrying .META., as described in
> HBASE-6870:
>
> In current logic, HTable#coprocessorExec always scan the whole table,
> its efficiency
> is low and will affect the Regionserver carrying .META. under large
> coprocessorExec requests
>
>
> Thanks again,
> Kim
> On Mon, Mar 4, 2013 at 8:08 PM, Stephen Boesch <java...@gmail.com> wrote:
>
> > great question from Kim and follow-up/answers.
> >
> >
> > 2013/3/4 Gary Helmling <ghelml...@gmail.com>
> >
> > > I see this is HBASE-6870.  I thought that sounded familiar.
> > >
> > >
> > > On Mon, Mar 4, 2013 at 6:23 PM, Gary Helmling <ghelml...@gmail.com>
> > wrote:
> > >
> > > >
> > > > Check your logs for whether your end-point coprocessor is hitting
> > > >> zookeeper on every invocation to figure out the region start key.
> > > >> Unfortunately (at least last time I checked), the default way of
> > > invoking
> > > >> an end point coprocessor doesn't use the meta cache. You can go
> > through
> > > a
> > > >> combination of the following instead:
> > > >>     HRegionLocation regionLocation = retried ?
> > > >>         connection.relocateRegion(**tableName, tableKey) :
> > > >>         connection.locateRegion(**tableName, tableKey);
> > > >>     ...
> > > >> Then call HConnection.processExecs call, passing in the regionKeys
> > from
> > > >> above.
> > > >> You can trap the error case of the region being relocated and try
> > again
> > > >> with retried = true and it'll update the meta data cache when
> > > >> relocateRegion is called.
> > > >>
> > > >
> > > >
> > > > Any idea if we have an improvement logged in JIRA for this?  This is
> > > > definitely something we should improve on.
> > > >
> > >
> >
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

RE: endpoint coprocessor performance

Reply via email to