GET performance degrades over time
Hi, My setup is as follows: 24 regionservers (7GB RAM, 8-core CPU, 5GB heap space) hbase 0.94.4 5-7 regions per regionserver I am doing an avg of 4k-5k random gets per regionserver per second and the performance is acceptable in the beginning. I have also done ~10K gets for a single regionserver and got the results back in 600-800ms. After a while the performance of the GETs starts degrading. The same ~10K random gets start taking upwards of 9s-10s. With regards to hbase settings that I have modified, I have disabled major compaction, increase region size to 100G and bumped up the handler count to 100. I monitored ganglia for metrics that vary when the performance shifts from good to bad and found that the fsPreadLatency_avg_time is almost 25x in the bad performing regionserver. fsReadLatency_avg_time is also slightly higher but not that much (it's around 2x). I took a thread dump of the regionserver process and also did CPU utilization monitoring. The CPU cycles were being spent on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads running that function is below this email. Any pointers on why positional reads degrade over time ? Or is this just an issue of disk I/O and I should start looking into that ? Thanks, Viral stacktrace for one of the handler doing blockread IPC Server handler 98 on 60020 - Thread t@147 java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324) - locked 3215ed96 (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384) at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) - locked 3da12c8a (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) - locked 3da12c8a (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561) - locked 74d81ea7 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
Re: GET performance degrades over time
This generally happens when the same block is accessed for the HFile. Are you seeing any contention on the HDFS side? Regards Ram On Thu, May 16, 2013 at 4:19 PM, Bing Jiang jiangbinglo...@gmail.comwrote: Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? 2) check the setting hfile.block.cache.size in your hbase-site.xml.
Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey
Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked the source code, it seems when trying to obtain the lock for a rowKey, if the entry for that row already exists and the waitfoBlock is false (therefore it won't retry, but just return a NULL value). I can see in doMiniBatchMutation(), the waitForBlock is set to be false. (most other places waitForBlock is always set to true). This exception is thrown from function lockRow(), which has been deprecated. I am not sure why it is deprecated, and what is used to replace it. Is this normal? which implies the HBase should not throw this misleading error message to log. Or should the client call some other API? Thanks Tian-Ying 2013-05-14 12:45:30,911 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Row lock -3430274391270203797 explicitly acquired by client 2013-05-14 12:45:30,911 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:29783,call:lockRow([B@339a6a5c, [B@5ebcd87b), rpc version=1, client version=29, methodsFingerPrint=0,client:10.53.106.37:58892 ,starttimems:1368560701128,queuetimems:847,class:HRegionServer,responsesize:0,method:lockRow} 2013-05-14 12:46:00,911 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2346) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2332) at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:384) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@2166c821, {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{id:[tagv]},maxVersions:1,row: slcsn-s00314.slc.ebay.com}), rpc version=1, client version=29, methodsFingerPrint=0 from 10.53.106.37:58892: output error 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey
Are you trying to get the row lock explicitly ? Using HTable.lockRow? Regards Ram On Thu, May 16, 2013 at 10:46 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.comwrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked the source code, it seems when trying to obtain the lock for a rowKey, if the entry for that row already exists and the waitfoBlock is false (therefore it won't retry, but just return a NULL value). I can see in doMiniBatchMutation(), the waitForBlock is set to be false. (most other places waitForBlock is always set to true). This exception is thrown from function lockRow(), which has been deprecated. I am not sure why it is deprecated, and what is used to replace it. Is this normal? which implies the HBase should not throw this misleading error message to log. Or should the client call some other API? Thanks Tian-Ying 2013-05-14 12:45:30,911 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Row lock -3430274391270203797 explicitly acquired by client 2013-05-14 12:45:30,911 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:29783,call:lockRow([B@339a6a5c, [B@5ebcd87b), rpc version=1, client version=29, methodsFingerPrint=0,client:10.53.106.37:58892 ,starttimems:1368560701128,queuetimems:847,class:HRegionServer,responsesize:0,method:lockRow} 2013-05-14 12:46:00,911 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2346) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2332) at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:384) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@2166c821, {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{id:[tagv]},maxVersions:1,row: slcsn-s00314.slc.ebay.com}), rpc version=1, client version=29, methodsFingerPrint=0 from 10.53.106.37:58892: output error 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey
it is HBase 0.92. The customer is using TSDB and AsyncHBase. I am not sure what their client code is calling exactly. But from the calling stack, it feels it use HTable.lockRow. Is this not recommended? If so, what should they use instead? Thanks Tian-Ying From: ramkrishna vasudevan [ramkrishna.s.vasude...@gmail.com] Sent: Thursday, May 16, 2013 10:41 AM To: user@hbase.apache.org Subject: Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey Are you trying to get the row lock explicitly ? Using HTable.lockRow? Regards Ram On Thu, May 16, 2013 at 10:46 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.comwrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked the source code, it seems when trying to obtain the lock for a rowKey, if the entry for that row already exists and the waitfoBlock is false (therefore it won't retry, but just return a NULL value). I can see in doMiniBatchMutation(), the waitForBlock is set to be false. (most other places waitForBlock is always set to true). This exception is thrown from function lockRow(), which has been deprecated. I am not sure why it is deprecated, and what is used to replace it. Is this normal? which implies the HBase should not throw this misleading error message to log. Or should the client call some other API? Thanks Tian-Ying 2013-05-14 12:45:30,911 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Row lock -3430274391270203797 explicitly acquired by client 2013-05-14 12:45:30,911 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:29783,call:lockRow([B@339a6a5c, [B@5ebcd87b), rpc version=1, client version=29, methodsFingerPrint=0,client:10.53.106.37:58892 ,starttimems:1368560701128,queuetimems:847,class:HRegionServer,responsesize:0,method:lockRow} 2013-05-14 12:46:00,911 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2346) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2332) at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:384) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@2166c821, {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{id:[tagv]},maxVersions:1,row: slcsn-s00314.slc.ebay.com}), rpc version=1, client version=29, methodsFingerPrint=0 from 10.53.106.37:58892: output error 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
RE: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey
FYI, below I quoted the customers' response after I explained the NULLException is caused by the row lock. So my question is if this is allowed situation for multiple threads/process to compete for the lock, the one who did not get should be considered normal and not throwing NullPointerException? Thanks Tian-Ying Quote from customer below: Well yes, not only multiple threads, but multiple processes! That’s why we need a lock. Although I do see some practical problems, and am trying to NOT do the same lock from multiple threads within the same process, but it is needed to coordinate locks across processes From: ramkrishna vasudevan [ramkrishna.s.vasude...@gmail.com] Sent: Thursday, May 16, 2013 10:16 AM To: user@hbase.apache.org Subject: Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked the source code, it seems when trying to obtain the lock for a rowKey, if the entry for that row already exists and the waitfoBlock is false (therefore it won't retry, but just return a NULL value). I can see in doMiniBatchMutation(), the waitForBlock is set to be false. (most other places waitForBlock is always set to true). This exception is thrown from function lockRow(), which has been deprecated. I am not sure why it is deprecated, and what is used to replace it. Is this normal? which implies the HBase should not throw this misleading error message to log. Or should the client call some other API? Thanks Tian-Ying 2013-05-14 12:45:30,911 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Row lock -3430274391270203797 explicitly acquired by client 2013-05-14 12:45:30,911 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:29783,call:lockRow([B@339a6a5c, [B@5ebcd87b), rpc version=1, client version=29, methodsFingerPrint=0,client:10.53.106.37:58892 ,starttimems:1368560701128,queuetimems:847,class:HRegionServer,responsesize:0,method:lockRow} 2013-05-14 12:46:00,911 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2346) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2332) at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:384) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@2166c821, {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{id:[tagv]},maxVersions:1,row: slcsn-s00314.slc.ebay.com}), rpc version=1, client version=29, methodsFingerPrint=0 from 10.53.106.37:58892: output error 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
Re: GET performance degrades over time
Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? I have set swap to 0. AFAIK, that's a recommended practice. Let me know if that should not be followed for nodes running HBase. 2) check the setting hfile.block.cache.size in your hbase-site.xml. It's the default i.e. 0.25
Key Value collision
Hi, I am wondering what happens when we add the following: row, col, timestamp -- v1 A flush happens. Now, we add row, col, timestamp -- v2 A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie broken during reads and during minor compactions, is it arbitrary ? Thanks Varun
Re: GET performance degrades over time
This generally happens when the same block is accessed for the HFile. Are you seeing any contention on the HDFS side? When you say contention what should I be looking for ? slow operations to respond to data block requests ? or some specific metric in ganglia ? -Viral
Re: GET performance degrades over time
Michael is correct. More information available about swap value on wikipedia: http://en.wikipedia.org/wiki/Swappiness 2013/5/16 Michael Segel michael_se...@hotmail.com Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? On May 16, 2013, at 1:43 PM, Viral Bajaria viral.baja...@gmail.com wrote: Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? I have set swap to 0. AFAIK, that's a recommended practice. Let me know if that should not be followed for nodes running HBase. 2) check the setting hfile.block.cache.size in your hbase-site.xml. It's the default i.e. 0.25
Re: GET performance degrades over time
Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? When I look at top it says: 0K total, 0K used, 0K free (as expected). I can try and add some swap but will do it as a last resort as suggested by you.
Re: GET performance degrades over time
Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? On May 16, 2013, at 1:43 PM, Viral Bajaria viral.baja...@gmail.com wrote: Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? I have set swap to 0. AFAIK, that's a recommended practice. Let me know if that should not be followed for nodes running HBase. 2) check the setting hfile.block.cache.size in your hbase-site.xml. It's the default i.e. 0.25
Re: GET performance degrades over time
If you're not swapping then don't worry about it. My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. (But I could be wrong. ) You really don't have a lot of memory, and you have a 5GB heap... MSLABS on? Could you be facing a GC pause? On May 16, 2013, at 1:53 PM, Viral Bajaria viral.baja...@gmail.com wrote: Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? When I look at top it says: 0K total, 0K used, 0K free (as expected). I can try and add some swap but will do it as a last resort as suggested by you.
Re: Key Value collision
Last row inserted wins. On May 16, 2013, at 1:49 PM, Varun Sharma va...@pinterest.com wrote: Hi, I am wondering what happens when we add the following: row, col, timestamp -- v1 A flush happens. Now, we add row, col, timestamp -- v2 A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie broken during reads and during minor compactions, is it arbitrary ? Thanks Varun
Re: GET performance degrades over time
If you're not swapping then don't worry about it. My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. (But I could be wrong. ) Thanks for sharing this info. Will remember for future debugging too. Checked the vm.swappiness and as suggested by Jean-Marc and it is definitely not set to 0. But since we are not swapping I doubt that's the issue here. You really don't have a lot of memory, and you have a 5GB heap... MSLABS on? Could you be facing a GC pause? MSLABS is on or rather I have not modified it and if I recall correctly it should be ON by default in 0.94.x. I have GC logs on and don't see stop the world GC pauses. GC logs are filling up quickly but I have noticed that on my high RAM instances too.
[ANNOUNCE] Phoenix 1.2 is now available
We are pleased to announce the immediate availability of Phoenix 1.2 (https://github.com/forcedotcom/phoenix/wiki/Download). Here are some of the release highlights: * Improve performance of multi-point and multi-range queries (20x plus) using new skip scan * Support TopN queries (3-70x faster than Hive) * Control row key order when defining primary key columns * Salt tables declaratively to prevent hot spotting * Specify columns dynamically at query time * Write Phoenix-compliant HFiles from Pig scripts and Map/Reduce jobs * Support SELECT DISTINCT * Leverage essential column family feature * Bundle command line terminal interface * Specify scale and precision on decimal type * Support fixed length binary type * Add TO_CHAR, TO_NUMBER, COALESCE, UPPER, LOWER, and REVERSE built-in functions HBase 0.94.4 or above is required with HBase 0.94.7 being recommended. For more detail, please see our announcement: http://phoenix-hbase.blogspot.com/2013/05/announcing-phoenix-12.html Regards, James @JamesPlusPlus http://phoenix-hbase.blogspot.com/
Re: Key Value collision
Except in the case of bulk loads; if you import cells with the same timestamp through a bulk load, the last row is non-deterministic. Facebook fixed the issue, and the patch has been backported to 0.95. The friendly folks at Cloudera are working on backporting the fix to 0.94 as well. Follow https://issues.apache.org/jira/browse/HBASE-8521 for the 0.94 backport progress if it is of interest to you. Jeff On Thu, May 16, 2013 at 12:00 PM, Michael Segel michael_se...@hotmail.comwrote: Last row inserted wins. On May 16, 2013, at 1:49 PM, Varun Sharma va...@pinterest.com wrote: Hi, I am wondering what happens when we add the following: row, col, timestamp -- v1 A flush happens. Now, we add row, col, timestamp -- v2 A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie broken during reads and during minor compactions, is it arbitrary ? Thanks Varun -- *Jeff Kolesky* Chief Software Architect *Opower*
Question about HFile seeking
Lets say I have the following in my table: col1 row1 v1 -- HFile entry would be row1,col1,ts1--v1 ol1 row1c v2 -- HFile entry would be row1c,ol1,ts1--v2 Now I issue a prefix scan asking row for row row1c, how do we seek - do we seek directly to row1c or would we seek to row1 first and then to row1c. The reason being that the HFile keys are the same for both the keys. I simply absorb one character from the column into the row. Thanks Varun
Re: [ANNOUNCE] Phoenix 1.2 is now available
Hi James, You have mentioned support for TopN query. Can you provide me HBase Jira ticket for that. I am also doing similar stuff in https://issues.apache.org/jira/browse/HBASE-7474. I am interested in knowing the details about that implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 12:29 PM, James Taylor jtay...@salesforce.comwrote: We are pleased to announce the immediate availability of Phoenix 1.2 ( https://github.com/**forcedotcom/phoenix/wiki/**Downloadhttps://github.com/forcedotcom/phoenix/wiki/Download). Here are some of the release highlights: * Improve performance of multi-point and multi-range queries (20x plus) using new skip scan * Support TopN queries (3-70x faster than Hive) * Control row key order when defining primary key columns * Salt tables declaratively to prevent hot spotting * Specify columns dynamically at query time * Write Phoenix-compliant HFiles from Pig scripts and Map/Reduce jobs * Support SELECT DISTINCT * Leverage essential column family feature * Bundle command line terminal interface * Specify scale and precision on decimal type * Support fixed length binary type * Add TO_CHAR, TO_NUMBER, COALESCE, UPPER, LOWER, and REVERSE built-in functions HBase 0.94.4 or above is required with HBase 0.94.7 being recommended. For more detail, please see our announcement: http://phoenix-hbase.blogspot.** com/2013/05/announcing-**phoenix-12.htmlhttp://phoenix-hbase.blogspot.com/2013/05/announcing-phoenix-12.html Regards, James @JamesPlusPlus http://phoenix-hbase.blogspot.**com/ http://phoenix-hbase.blogspot.com/ -- Thanks Regards, Anil Gupta
Re: Question about HFile seeking
Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? Thanks Varun On Thu, May 16, 2013 at 1:56 PM, Varun Sharma va...@pinterest.com wrote: Lets say I have the following in my table: col1 row1 v1 -- HFile entry would be row1,col1,ts1--v1 ol1 row1c v2 -- HFile entry would be row1c,ol1,ts1--v2 Now I issue a prefix scan asking row for row row1c, how do we seek - do we seek directly to row1c or would we seek to row1 first and then to row1c. The reason being that the HFile keys are the same for both the keys. I simply absorb one character from the column into the row. Thanks Varun
Re: Question about HFile seeking
On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc., lengths and offsets so the comparators on ly compare pertinent bytes. If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...). St.Ack
Re: Question about HFile seeking
What you seeing Varun (or think you are seeing)? St.Ack On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc., lengths and offsets so the comparators on ly compare pertinent bytes. If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...). St.Ack
Re: Question about HFile seeking
Nothing, I am just curious... So, we will do a bunch of wasteful scanning - that's lets say row1 has col1 - col10 - basically 100K columns, we will scan all those key values even though we are going to discard them, is that correct ? On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: What you seeing Varun (or think you are seeing)? St.Ack On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc., lengths and offsets so the comparators on ly compare pertinent bytes. If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...). St.Ack
Re: Question about HFile seeking
What is your query? If scanning over rows of 100k, yeah, you will go through each row's content unless you specify you are only interested in some subset of the rows. Then a 'skipping' facility will cut where we will use the index to skip over unwanted content. St.Ack On Thu, May 16, 2013 at 2:42 PM, Varun Sharma va...@pinterest.com wrote: Nothing, I am just curious... So, we will do a bunch of wasteful scanning - that's lets say row1 has col1 - col10 - basically 100K columns, we will scan all those key values even though we are going to discard them, is that correct ? On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: What you seeing Varun (or think you are seeing)? St.Ack On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc., lengths and offsets so the comparators on ly compare pertinent bytes. If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...). St.Ack
Re: Question about HFile seeking
Sorry I may have misunderstood what you meant. When you look for row1c in the HFile index - is it going to also match for row1,col1 or only match row1c. It all depends how the index is organized, if its only on HFile keys, it could also match row1,col1 unless we use some demarcator b/w row1 and col1 in our HFile keys. So I am just wondering if we will totally skip touch row1,col1 in this case and jump straight to row1c or not. The other option is that we would actually hit row1,col1 since the prefix matches row1c when looking at the HFile key and then, we look at the length of the row to grab the real portion from the concatenated HFile key and discard all row1 entries. Does that make my query clearer ? On Thu, May 16, 2013 at 2:42 PM, Varun Sharma va...@pinterest.com wrote: Nothing, I am just curious... So, we will do a bunch of wasteful scanning - that's lets say row1 has col1 - col10 - basically 100K columns, we will scan all those key values even though we are going to discard them, is that correct ? On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: What you seeing Varun (or think you are seeing)? St.Ack On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc., lengths and offsets so the comparators on ly compare pertinent bytes. If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...). St.Ack
Re: Question about HFile seeking
Referring to your comment above again If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...) I am trying to understand how you could seek right across to the block containing row1c using the HFile Index. If the index is just built on HFile keys and there is no demarcation b/w rows and col(s), you would hit the block for row1,col1. After that you would either need a way to skip right across to row1c after you find that this is not the row you are looking for or you will have to simply keep scanning and discarding sequentially until you get row1c. If you have to keep scanning and discarding, then that is probably suboptimal. But if there is a way to skip right across from row1,col1 to row1c, then thats great, though I wonder how that would be implemented. Varun On Thu, May 16, 2013 at 2:55 PM, Varun Sharma va...@pinterest.com wrote: Sorry I may have misunderstood what you meant. When you look for row1c in the HFile index - is it going to also match for row1,col1 or only match row1c. It all depends how the index is organized, if its only on HFile keys, it could also match row1,col1 unless we use some demarcator b/w row1 and col1 in our HFile keys. So I am just wondering if we will totally skip touch row1,col1 in this case and jump straight to row1c or not. The other option is that we would actually hit row1,col1 since the prefix matches row1c when looking at the HFile key and then, we look at the length of the row to grab the real portion from the concatenated HFile key and discard all row1 entries. Does that make my query clearer ? On Thu, May 16, 2013 at 2:42 PM, Varun Sharma va...@pinterest.com wrote: Nothing, I am just curious... So, we will do a bunch of wasteful scanning - that's lets say row1 has col1 - col10 - basically 100K columns, we will scan all those key values even though we are going to discard them, is that correct ? On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: What you seeing Varun (or think you are seeing)? St.Ack On Thu, May 16, 2013 at 2:30 PM, Stack st...@duboce.net wrote: On Thu, May 16, 2013 at 2:03 PM, Varun Sharma va...@pinterest.com wrote: Or do we use some kind of demarcator b/w rows and columns and timestamps when building the HFile keys and the indices ? No demarcation but in KeyValue, we keep row, column family name, column family qualifier, etc., lengths and offsets so the comparators on ly compare pertinent bytes. If you doing a prefix scan w/ row1c, we should be starting the scan at row1c, not row1 (or more correctly at the row that starts the block we believe has a row1c row in it...). St.Ack
Re: [ANNOUNCE] Phoenix 1.2 is now available
Hi Anil, No HBase changes were required. We're already leveraging coprocessors in HBase which is a key enabler. The other pieces needed are: - a type system - a means to evaluate an ORDER BY expression on the server - memory tracking/throttling (the topN for each region are held in memory until the client does a merge sort) Phoenix has all these, so it was just a matter of packaging them up to support this. Thanks, James On 05/16/2013 02:02 PM, anil gupta wrote: Hi James, You have mentioned support for TopN query. Can you provide me HBase Jira ticket for that. I am also doing similar stuff in https://issues.apache.org/jira/browse/HBASE-7474. I am interested in knowing the details about that implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 12:29 PM, James Taylor jtay...@salesforce.comwrote: We are pleased to announce the immediate availability of Phoenix 1.2 ( https://github.com/**forcedotcom/phoenix/wiki/**Downloadhttps://github.com/forcedotcom/phoenix/wiki/Download). Here are some of the release highlights: * Improve performance of multi-point and multi-range queries (20x plus) using new skip scan * Support TopN queries (3-70x faster than Hive) * Control row key order when defining primary key columns * Salt tables declaratively to prevent hot spotting * Specify columns dynamically at query time * Write Phoenix-compliant HFiles from Pig scripts and Map/Reduce jobs * Support SELECT DISTINCT * Leverage essential column family feature * Bundle command line terminal interface * Specify scale and precision on decimal type * Support fixed length binary type * Add TO_CHAR, TO_NUMBER, COALESCE, UPPER, LOWER, and REVERSE built-in functions HBase 0.94.4 or above is required with HBase 0.94.7 being recommended. For more detail, please see our announcement: http://phoenix-hbase.blogspot.** com/2013/05/announcing-**phoenix-12.htmlhttp://phoenix-hbase.blogspot.com/2013/05/announcing-phoenix-12.html Regards, James @JamesPlusPlus http://phoenix-hbase.blogspot.**com/ http://phoenix-hbase.blogspot.com/
Re: [ANNOUNCE] Phoenix 1.2 is now available
Hi James, Is this implementation present in the GitHub repo of Phoenix? If yes, can you provide me the package name/classes? I haven't got the opportunity to try out Phoenix yet but i would like to have a look at the implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 4:15 PM, James Taylor jtay...@salesforce.comwrote: Hi Anil, No HBase changes were required. We're already leveraging coprocessors in HBase which is a key enabler. The other pieces needed are: - a type system - a means to evaluate an ORDER BY expression on the server - memory tracking/throttling (the topN for each region are held in memory until the client does a merge sort) Phoenix has all these, so it was just a matter of packaging them up to support this. Thanks, James On 05/16/2013 02:02 PM, anil gupta wrote: Hi James, You have mentioned support for TopN query. Can you provide me HBase Jira ticket for that. I am also doing similar stuff in https://issues.apache.org/**jira/browse/HBASE-7474https://issues.apache.org/jira/browse/HBASE-7474. I am interested in knowing the details about that implementation. Thanks, Anil Gupta On Thu, May 16, 2013 at 12:29 PM, James Taylor jtay...@salesforce.com wrote: We are pleased to announce the immediate availability of Phoenix 1.2 ( https://github.com/forcedotcom/phoenix/wiki/Downloadhttps://github.com/**forcedotcom/phoenix/wiki/**Download https://github.com/**forcedotcom/phoenix/wiki/**Downloadhttps://github.com/forcedotcom/phoenix/wiki/Download ). Here are some of the release highlights: * Improve performance of multi-point and multi-range queries (20x plus) using new skip scan * Support TopN queries (3-70x faster than Hive) * Control row key order when defining primary key columns * Salt tables declaratively to prevent hot spotting * Specify columns dynamically at query time * Write Phoenix-compliant HFiles from Pig scripts and Map/Reduce jobs * Support SELECT DISTINCT * Leverage essential column family feature * Bundle command line terminal interface * Specify scale and precision on decimal type * Support fixed length binary type * Add TO_CHAR, TO_NUMBER, COALESCE, UPPER, LOWER, and REVERSE built-in functions HBase 0.94.4 or above is required with HBase 0.94.7 being recommended. For more detail, please see our announcement: http://phoenix-hbase.blogspot. com/2013/05/announcing-phoenix-12.htmlhttp://** phoenix-hbase.blogspot.com/**2013/05/announcing-phoenix-12.**htmlhttp://phoenix-hbase.blogspot.com/2013/05/announcing-phoenix-12.html Regards, James @JamesPlusPlus http://phoenix-hbase.blogspot.com/ http://phoenix-hbase.** blogspot.com/ http://phoenix-hbase.blogspot.com/ -- Thanks Regards, Anil Gupta
Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey
The lockRow and unlockRow have been replaced by checkAndXXX and increment() apis so that any operation on a particular row can be done automically. Am not sure of the use case that you are addressing here but i recommend taking a look at these apis if they solve the problem for you. RowLocks are prone to more thread contentions and some deadlock situations when there are lot of threads waiting for the same row lock. Regards Ram On Fri, May 17, 2013 at 12:11 AM, Tianying Chang tich...@ebaysf.com wrote: FYI, below I quoted the customers' response after I explained the NULLException is caused by the row lock. So my question is if this is allowed situation for multiple threads/process to compete for the lock, the one who did not get should be considered normal and not throwing NullPointerException? Thanks Tian-Ying Quote from customer below: Well yes, not only multiple threads, but multiple processes! That’s why we need a lock. Although I do see some practical problems, and am trying to NOT do the same lock from multiple threads within the same process, but it is needed to coordinate locks across processes From: ramkrishna vasudevan [ramkrishna.s.vasude...@gmail.com] Sent: Thursday, May 16, 2013 10:16 AM To: user@hbase.apache.org Subject: Re: NullPointerException while loading large amount of new rows into HBase, exception is thrown when trying to obtain lock for RowKey Which version of HBase? Regards Ram On Thu, May 16, 2013 at 10:42 PM, Tianying Chang tich...@ebaysf.com wrote: Hi, When our customers(using TSDB) loads large amount of data into HBase, we saw many NullPointerException in the RS logs as below. I checked the source code, it seems when trying to obtain the lock for a rowKey, if the entry for that row already exists and the waitfoBlock is false (therefore it won't retry, but just return a NULL value). I can see in doMiniBatchMutation(), the waitForBlock is set to be false. (most other places waitForBlock is always set to true). This exception is thrown from function lockRow(), which has been deprecated. I am not sure why it is deprecated, and what is used to replace it. Is this normal? which implies the HBase should not throw this misleading error message to log. Or should the client call some other API? Thanks Tian-Ying 2013-05-14 12:45:30,911 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Row lock -3430274391270203797 explicitly acquired by client 2013-05-14 12:45:30,911 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:29783,call:lockRow([B@339a6a5c , [B@5ebcd87b), rpc version=1, client version=29, methodsFingerPrint=0,client:10.53.106.37:58892 ,starttimems:1368560701128,queuetimems:847,class:HRegionServer,responsesize:0,method:lockRow} 2013-05-14 12:46:00,911 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2346) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2332) at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:384) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call get([B@2166c821, {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{id:[tagv]},maxVersions:1,row: slcsn-s00314.slc.ebay.com}), rpc version=1, client version=29, methodsFingerPrint=0 from 10.53.106.37:58892: output error 2013-05-14 12:46:02,514 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)