Re: Increment operations in hbase

kiran Sun, 13 Jan 2013 00:34:49 -0800

Also, the CF for the increments has been set to IN_MEMORY and bloom filter
ROWCOL



On Sun, Jan 13, 2013 at 1:17 PM, kiran <[email protected]> wrote:

> The idea was given a region server i can get HRegion and Store files in
> that region. In Store, there is a method incrementColumnValue, hence I
> thought of using this method as it may be low-level implementation.
>
> Yes, gets are proving very costly for me. The other operation in addition
> to this is writing data into hbase in the regionserver but thats into a
> different table not to the one which i need to increment values.
>
> I did profile using gets and puts across my cluster rather than directly
> using HTable.increment. I am running the daemon in each node, with 1000
> batch get actions and using HTableUtil.bucketRSPut for puts, some nodes
> were able to complete in 10 seconds , some were taking about 3 minutes to
> complete for 1000.
>
> What is surprising for me is I precomputed rows that are hosted in each
> node and starting the daemon and issued gets only on the rows in that node
> so that data is local, even in this case 3 minutes worst case scenario for
> 1000 actions is huge.
>
>
>
> On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[email protected]>wrote:
>
>> >Another alternative is to get store files for each row hosted in that
>> node
>> operating directly on store files for each increment object ??
>>
>> Sorry didnt get what is the idea. Can you explain pls?
>> Regarding support for Increments in batch API. Sorry I was checking 94
>> code
>> base. In 0.92 this support is not there.  :(
>>
>> Have you done any profiling of the operation at RS side? How many HFiles
>> on
>> an avg per store at this op time and how many CFs for table? Gets seems to
>> be costly for you? Is this bulk increment op only happening at this time?
>> Or some other concurrent ops? Is block cache getting used? Checked cache
>> hit ratio like metric?
>>
>> -Anoop-
>>
>> On Sun, Jan 13, 2013 at 12:20 PM, kiran <[email protected]>
>> wrote:
>>
>> > I am using hbase 0.92.1 and the table is split evenly across 19 nodes
>> and I
>> > know the node region splitd. I can construct increment objects for each
>> row
>> > hosted in that node according to splits (30-50k approx in 15 min per
>> node)
>> > ...
>> >
>> > there is no batch increment support (in api it is given it supports only
>> > get, put and delete)...can I directly use HTable.increment for 30-50k
>> > increment objects in each node sequentially or multithreaded and finish
>> in
>> > 15 min.
>> >
>> > Another alternative is to get store files for each row hosted in that
>> node
>> > operating directly on store files for each increment object ??
>> >
>> >
>> >
>> > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[email protected]>
>> wrote:
>> >
>> > > IMHO, this seems too low - 1 million operations in 15 minutes
>> translates
>> > to
>> > > 2K increment operations per second which should be easy to support.
>> > > Moreover, you are running increments on different rows, so contention
>> due
>> > > to row locks is also not likely to be a problem.
>> > >
>> > > On hbase 0.94.0, I have seen upto 1K increments per second (note that
>> > this
>> > > will be significantly slower than incrementing individual rows
>> because of
>> > > contention and also this would be limited to 1 node, the one which
>> hosts
>> > > the row). So, I would assume that throughput should be significantly
>> > higher
>> > > for increments across multiple rows. How many nodes are you using and
>> is
>> > > the table appropriately split across the nodes.
>> > >
>> > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[email protected]> wrote:
>> > >
>> > > > Can you tell us which version of HBase you are using ?
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika <
>> [email protected]>
>> > > > wrote:
>> > > >
>> > > > > Most time is spent reading from Store file and not on network
>> > transfer
>> > > > time
>> > > > > of Increment objects.
>> > > > >
>> > > > > Sent from my iPhone
>> > > > >
>> > > > > On 12 בינו 2013, at 17:40, Anoop John <[email protected]>
>> wrote:
>> > > > >
>> > > > > Hi
>> > > > >     Can you check with using API  HTable#batch()?  Here you can
>> > batch a
>> > > > > number of increments for many rows in just one RPC call. Might
>> help
>> > you
>> > > > to
>> > > > > reduce the net time taken.  Good luck.
>> > > > >
>> > > > > -Anoop-
>> > > > >
>> > > > > On Sat, Jan 12, 2013 at 4:07 PM, kiran <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > >
>> > > > > My usecase is I need to increment 1 million rows with in 15 mins.
>> I
>> > > tried
>> > > > >
>> > > > > two approaches but none of the yielded results.
>> > > > >
>> > > > >
>> > > > > I have used HTable.increment, but is not getting completed in the
>> > > > specified
>> > > > >
>> > > > > time. I tried multi-threading also but it is very costly. I have
>> also
>> > > > >
>> > > > > implemented get and put as other alternative, but that approach is
>> > also
>> > > > not
>> > > > >
>> > > > > getting completed in 15 mins.
>> > > > >
>> > > > >
>> > > > > Can I use any low level implementation like using "Store or
>> > > > HRegionServer"
>> > > > >
>> > > > > to increment 1 million rows. I know the table splits, and region
>> > > servers
>> > > > >
>> > > > > serving them, and rows which fall into table splits. I suspect the
>> > > major
>> > > > >
>> > > > > concern as network I/O rather than processing with the above two
>> > > > >
>> > > > > approaches.
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Thank you
>> > > > >
>> > > > > Kiran Sarvabhotla
>> > > > >
>> > > > >
>> > > > > -----Even a correct decision is wrong when it is taken late
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thank you
>> > Kiran Sarvabhotla
>> >
>> > -----Even a correct decision is wrong when it is taken late
>> >
>>
>
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>
>


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Increment operations in hbase

Reply via email to