Also, the CF for the increments has been set to IN_MEMORY and bloom filter ROWCOL
On Sun, Jan 13, 2013 at 1:17 PM, kiran <[email protected]> wrote: > The idea was given a region server i can get HRegion and Store files in > that region. In Store, there is a method incrementColumnValue, hence I > thought of using this method as it may be low-level implementation. > > Yes, gets are proving very costly for me. The other operation in addition > to this is writing data into hbase in the regionserver but thats into a > different table not to the one which i need to increment values. > > I did profile using gets and puts across my cluster rather than directly > using HTable.increment. I am running the daemon in each node, with 1000 > batch get actions and using HTableUtil.bucketRSPut for puts, some nodes > were able to complete in 10 seconds , some were taking about 3 minutes to > complete for 1000. > > What is surprising for me is I precomputed rows that are hosted in each > node and starting the daemon and issued gets only on the rows in that node > so that data is local, even in this case 3 minutes worst case scenario for > 1000 actions is huge. > > > > On Sun, Jan 13, 2013 at 12:52 PM, Anoop John <[email protected]>wrote: > >> >Another alternative is to get store files for each row hosted in that >> node >> operating directly on store files for each increment object ?? >> >> Sorry didnt get what is the idea. Can you explain pls? >> Regarding support for Increments in batch API. Sorry I was checking 94 >> code >> base. In 0.92 this support is not there. :( >> >> Have you done any profiling of the operation at RS side? How many HFiles >> on >> an avg per store at this op time and how many CFs for table? Gets seems to >> be costly for you? Is this bulk increment op only happening at this time? >> Or some other concurrent ops? Is block cache getting used? Checked cache >> hit ratio like metric? >> >> -Anoop- >> >> On Sun, Jan 13, 2013 at 12:20 PM, kiran <[email protected]> >> wrote: >> >> > I am using hbase 0.92.1 and the table is split evenly across 19 nodes >> and I >> > know the node region splitd. I can construct increment objects for each >> row >> > hosted in that node according to splits (30-50k approx in 15 min per >> node) >> > ... >> > >> > there is no batch increment support (in api it is given it supports only >> > get, put and delete)...can I directly use HTable.increment for 30-50k >> > increment objects in each node sequentially or multithreaded and finish >> in >> > 15 min. >> > >> > Another alternative is to get store files for each row hosted in that >> node >> > operating directly on store files for each increment object ?? >> > >> > >> > >> > On Sun, Jan 13, 2013 at 1:50 AM, Varun Sharma <[email protected]> >> wrote: >> > >> > > IMHO, this seems too low - 1 million operations in 15 minutes >> translates >> > to >> > > 2K increment operations per second which should be easy to support. >> > > Moreover, you are running increments on different rows, so contention >> due >> > > to row locks is also not likely to be a problem. >> > > >> > > On hbase 0.94.0, I have seen upto 1K increments per second (note that >> > this >> > > will be significantly slower than incrementing individual rows >> because of >> > > contention and also this would be limited to 1 node, the one which >> hosts >> > > the row). So, I would assume that throughput should be significantly >> > higher >> > > for increments across multiple rows. How many nodes are you using and >> is >> > > the table appropriately split across the nodes. >> > > >> > > On Sat, Jan 12, 2013 at 10:59 AM, Ted Yu <[email protected]> wrote: >> > > >> > > > Can you tell us which version of HBase you are using ? >> > > > >> > > > Thanks >> > > > >> > > > On Sat, Jan 12, 2013 at 10:57 AM, Asaf Mesika < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Most time is spent reading from Store file and not on network >> > transfer >> > > > time >> > > > > of Increment objects. >> > > > > >> > > > > Sent from my iPhone >> > > > > >> > > > > On 12 בינו 2013, at 17:40, Anoop John <[email protected]> >> wrote: >> > > > > >> > > > > Hi >> > > > > Can you check with using API HTable#batch()? Here you can >> > batch a >> > > > > number of increments for many rows in just one RPC call. Might >> help >> > you >> > > > to >> > > > > reduce the net time taken. Good luck. >> > > > > >> > > > > -Anoop- >> > > > > >> > > > > On Sat, Jan 12, 2013 at 4:07 PM, kiran < >> [email protected]> >> > > > > wrote: >> > > > > >> > > > > Hi, >> > > > > >> > > > > >> > > > > My usecase is I need to increment 1 million rows with in 15 mins. >> I >> > > tried >> > > > > >> > > > > two approaches but none of the yielded results. >> > > > > >> > > > > >> > > > > I have used HTable.increment, but is not getting completed in the >> > > > specified >> > > > > >> > > > > time. I tried multi-threading also but it is very costly. I have >> also >> > > > > >> > > > > implemented get and put as other alternative, but that approach is >> > also >> > > > not >> > > > > >> > > > > getting completed in 15 mins. >> > > > > >> > > > > >> > > > > Can I use any low level implementation like using "Store or >> > > > HRegionServer" >> > > > > >> > > > > to increment 1 million rows. I know the table splits, and region >> > > servers >> > > > > >> > > > > serving them, and rows which fall into table splits. I suspect the >> > > major >> > > > > >> > > > > concern as network I/O rather than processing with the above two >> > > > > >> > > > > approaches. >> > > > > >> > > > > >> > > > > -- >> > > > > >> > > > > Thank you >> > > > > >> > > > > Kiran Sarvabhotla >> > > > > >> > > > > >> > > > > -----Even a correct decision is wrong when it is taken late >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Thank you >> > Kiran Sarvabhotla >> > >> > -----Even a correct decision is wrong when it is taken late >> > >> > > > > -- > Thank you > Kiran Sarvabhotla > > -----Even a correct decision is wrong when it is taken late > > -- Thank you Kiran Sarvabhotla -----Even a correct decision is wrong when it is taken late
