GET performance degrades over time

2013-05-16 Thread Viral Bajaria
Hi, My setup is as follows: 24 regionservers (7GB RAM, 8-core CPU, 5GB heap space) hbase 0.94.4 5-7 regions per regionserver I am doing an avg of 4k-5k random gets per regionserver per second and the performance is acceptable in the beginning. I have also done ~10K gets for a single regionserver

Re: GET performance degrades over time

2013-05-16 Thread ramkrishna vasudevan
This generally happens when the same block is accessed for the HFile. Are you seeing any contention on the HDFS side? Regards Ram On Thu, May 16, 2013 at 4:19 PM, Bing Jiang jiangbinglo...@gmail.comwrote: Have you checked your HBase environment? I think it perhaps come from: 1) System uses

Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread ramkrishna vasudevan
Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked the source code, it seems when

Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread ramkrishna vasudevan
Are you trying to get the row lock explicitly ? Using HTable.lockRow? Regards Ram On Thu, May 16, 2013 at 10:46 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang

RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread Tianying Chang
it is HBase 0.92. The customer is using TSDB and AsyncHBase. I am not sure what their client code is calling exactly. But from the calling stack, it feels it use HTable.lockRow. Is this not recommended? If so, what should they use instead? Thanks Tian-Ying

RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread Tianying Chang
FYI, below I quoted the customers' response after I explained the NULLException is caused by the row lock. So my question is if this is allowed situation for multiple threads/process to compete for the lock, the one who did not get should be considered normal and not throwing

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? I have set swap to 0. AFAIK, that's a recommended practice. Let me know if that should not be followed for nodes running HBase. 2) check

Key Value collision

2013-05-16 Thread Varun Sharma
Hi, I am wondering what happens when we add the following: row, col, timestamp -- v1 A flush happens. Now, we add row, col, timestamp -- v2 A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie broken during reads and during minor compactions, is it arbitrary ? Thanks

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
This generally happens when the same block is accessed for the HFile. Are you seeing any contention on the HDFS side? When you say contention what should I be looking for ? slow operations to respond to data block requests ? or some specific metric in ganglia ? -Viral

Re: GET performance degrades over time

2013-05-16 Thread Jean-Marc Spaggiari
Michael is correct. More information available about swap value on wikipedia: http://en.wikipedia.org/wiki/Swappiness 2013/5/16 Michael Segel michael_se...@hotmail.com Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort'

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? When I look at top it says: 0K total, 0K used, 0K free (as expected). I can try

Re: GET performance degrades over time

2013-05-16 Thread Michael Segel
Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? On May 16, 2013, at 1:43 PM, Viral Bajaria viral.baja...@gmail.com wrote: Have

Re: GET performance degrades over time

2013-05-16 Thread Michael Segel
If you're not swapping then don't worry about it. My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. (But I could be wrong. ) You really don't have a lot of memory, and you have a 5GB heap... MSLABS on? Could you be facing

Re: Key Value collision

2013-05-16 Thread Michael Segel
Last row inserted wins. On May 16, 2013, at 1:49 PM, Varun Sharma va...@pinterest.com wrote: Hi, I am wondering what happens when we add the following: row, col, timestamp -- v1 A flush happens. Now, we add row, col, timestamp -- v2 A flush happens again. In this case if

Re: GET performance degrades over time

2013-05-16 Thread Viral Bajaria
If you're not swapping then don't worry about it. My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. (But I could be wrong. ) Thanks for sharing this info. Will remember for future debugging too. Checked the vm.swappiness

[ANNOUNCE] Phoenix 1.2 is now available

2013-05-16 Thread James Taylor
We are pleased to announce the immediate availability of Phoenix 1.2 (https://github.com/forcedotcom/phoenix/wiki/Download). Here are some of the release highlights: * Improve performance of multi-point and multi-range queries (20x plus) using new skip scan * Support TopN queries (3-70x

Re: Key Value collision

2013-05-16 Thread Jeff Kolesky
Except in the case of bulk loads; if you import cells with the same timestamp through a bulk load, the last row is non-deterministic. Facebook fixed the issue, and the patch has been backported to 0.95. The friendly folks at Cloudera are working on backporting the fix to 0.94 as well. Follow

Question about HFile seeking

2013-05-16 Thread Varun Sharma
Lets say I have the following in my table: col1 row1 v1 -- HFile entry would be row1,col1,ts1--v1 ol1 row1c v2 -- HFile entry would be row1c,ol1,ts1--v2 Now I issue a prefix scan asking row for row row1c, how do we seek - do we seek

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-16 Thread anil gupta
Hi James, You have mentioned support for TopN query. Can you provide me HBase Jira ticket for that. I am also doing similar stuff in https://issues.apache.org/jira/browse/HBASE-7474. I am interested in knowing the details about that implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at

Re: Question about HFile seeking

2013-05-16 Thread Varun Sharma
Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? Thanks Varun On Thu, May 16, 2013 at 1:56 PM, Varun Sharma va...@pinterest.com wrote: Lets say I have the following in my table: col1 row1 v1

Re: Question about HFile seeking

2013-05-16 Thread Stack
On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc.,

Re: Question about HFile seeking

2013-05-16 Thread Stack
What you seeing Varun (or think you are seeing)? St.Ack On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile

Re: Question about HFile seeking

2013-05-16 Thread Varun Sharma
Nothing, I am just curious... So, we will do a bunch of wasteful scanning - that's lets say row1 has col1 - col10 - basically 100K columns, we will scan all those key values even though we are going to discard them, is that correct ? On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net

Re: Question about HFile seeking

2013-05-16 Thread Stack
What is your query? If scanning over rows of 100k, yeah, you will go through each row's content unless you specify you are only interested in some subset of the rows. Then a 'skipping' facility will cut where we will use the index to skip over unwanted content. St.Ack On Thu, May 16, 2013 at

Re: Question about HFile seeking

2013-05-16 Thread Varun Sharma
Sorry I may have misunderstood what you meant. When you look for row1c in the HFile index - is it going to also match for row1,col1 or only match row1c. It all depends how the index is organized, if its only on HFile keys, it could also match row1,col1 unless we use some demarcator b/w row1 and

Re: Question about HFile seeking

2013-05-16 Thread Varun Sharma
Referring to your comment above again If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...) I am trying to understand how you could seek right across to the block containing

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-16 Thread James Taylor
Hi Anil, No HBase changes were required. We're already leveraging coprocessors in HBase which is a key enabler. The other pieces needed are: - a type system - a means to evaluate an ORDER BY expression on the server - memory tracking/throttling (the topN for each region are held in memory

Re: [ANNOUNCE] Phoenix 1.2 is now available

2013-05-16 Thread anil gupta
Hi James, Is this implementation present in the GitHub repo of Phoenix? If yes, can you provide me the package name/classes? I haven't got the opportunity to try out Phoenix yet but i would like to have a look at the implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 4:15 PM, James

Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey

2013-05-16 Thread ramkrishna vasudevan
The lockRow and unlockRow have been replaced by checkAndXXX and increment() apis so that any operation on a particular row can be done automically. Am not sure of the use case that you are addressing here but i recommend taking a look at these apis if they solve the problem for you. RowLocks are