Re: transactional hbase
Hi, HBase supports row-level atomic operations. If you need to update more than a row in a single transaction you'll have to use a coordination mechanism. You can use Zookeeper for that. However you can also evaluate whether you can't use a single table for your operation by taking advantage of the columnar model. Basically you can encode an entire star schema in a single table in HBase. Each family can act as a table. So when you have 1-M relations you can just write all the values (M) in a single row (1). This way you could use a single put operation to update your child entities. Cosmin On Jul 16, 2010, at 12:57 AM, N Kapshoo wrote: Can someone please point me to some more info? I am concerned about inconsistent data and am evaluating risk... Thanks. On Thu, Jul 15, 2010 at 3:11 PM, N Kapshoo nkaps...@gmail.com wrote: Hi, I have a write API call that does 3 puts (2 puts to one table and a 3rd put to a second table). How do I go about making sure that these all happen or none happen at all? In short, an atomic transaction. I read up a little bit about the TransactionManager and that I need to modify hbase-site.xml to make these TransactionalRegion Servers. Can someone point me to some more info about the same? What versions, what performance impacts etc? Are there some good urls that anyone can share? I am using HBase 0.20.3 at this time. I dont believe transactions are supported in this version. If I were to go live and then decide to use transactions later, how should I plan on an upgrade? Thanks.
Row level locking?
Ok, First, I'm writing this before I've had my first cup of coffee so I am apologizing in advance if the question is a brain dead question Going from a relational background, some of these questions may not make sense in the HBase world. When does HBase acquire a lock on a row and how long does it persist? Does the lock only hit the current row, or does it also lock the adjacent rows too? Does HBase support the concept of 'dirty reads'? The issue is what happens when you have two jobs trying to hit the same table at the same time and update/read the rows at the same time. A developer came across a problem and the fix was to use the HTable.close() method to release any resources. I am wondering if you explicitly have to clean up or can a lazy developer let the object just go out of scope and get GC'd. Thx -Mike _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
Re: Row level locking?
Currently a row is part of a region and there's a single region server serving that region at a particular moment. So when that row is updated a lock is acquired for that row until the actual data is updated in memory (note that a put will be written to cache on the region server and also persisted in the write-ahead log - WAL). Subsequent puts to that row will have to wait for that lock. HBase is fully consistent. This being said all the locking takes place at row level only, so when you scan you have to take that into account as there's no range locking. I'm not sure I understand the resource releasing issue. HTable.close() flushes the current write buffer (you can have write buffer if you use autoFlush set to false). Cosmin On Jul 16, 2010, at 1:33 PM, Michael Segel wrote: Ok, First, I'm writing this before I've had my first cup of coffee so I am apologizing in advance if the question is a brain dead question Going from a relational background, some of these questions may not make sense in the HBase world. When does HBase acquire a lock on a row and how long does it persist? Does the lock only hit the current row, or does it also lock the adjacent rows too? Does HBase support the concept of 'dirty reads'? The issue is what happens when you have two jobs trying to hit the same table at the same time and update/read the rows at the same time. A developer came across a problem and the fix was to use the HTable.close() method to release any resources. I am wondering if you explicitly have to clean up or can a lazy developer let the object just go out of scope and get GC'd. Thx -Mike _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
RE: Client Connection to Zookeeper times out during mapreduce
Thanks St.Ack! I must have lost that config after my last hadoop upgrade. Much appreciated! Travis -Original Message- From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack Sent: Thursday, July 15, 2010 11:42 AM To: user@hbase.apache.org Subject: Re: Client Connection to Zookeeper times out during mapreduce Is this intentional? Attempting connection to server localhost/127.0.0.1:2181 I'd guess not. Would seem to indicate then that your conf dir is not on the CLASSPATH when the child task runs? You've seen http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath? St.Ack On Thu, Jul 15, 2010 at 7:16 AM, Hegner, Travis theg...@trilliumit.com wrote: Afterthought: hadoop: 0.20.2, r911707 hbase: 0.20.5, r950246 Thanks Again! -Original Message- From: Hegner, Travis [mailto:theg...@trilliumit.com] Sent: Thursday, July 15, 2010 10:15 AM To: user@hbase.apache.org Subject: Client Connection to Zookeeper times out during mapreduce Hi All! I apologize in advance for what is probably a rookie mistake, but it's been a while since I've done a map reduce with Hbase as input or output. I'm assuming that this is a cluster config problem because even the built-in rowcounter mapreduce job is failing. The hbase shell is operating normally, and I can even access Hbase from thrift. Cluster: 1 master: NN, JT, ZK, HM 5 slaves: DN, TT, RS When attempting to run a mapreduce job with Hbase as a source, my map tasks are failing with the following logged exceptions: 2010-07-15 09:16:22,383 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/tmp/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201007150924_0003/attempt_201007150924_0003_m_00_0/work 2010-07-15 09:16:22,386 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=6 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@80fa6f 2010-07-15 09:16:22,389 INFO org.apache.zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false 2010-07-15 09:16:22,402 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server localhost/127.0.0.1:2181 2010-07-15 09:16:22,407 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@1d85f79 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) 2010-07-15 09:16:22,416 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) 2010-07-15 09:16:22,416 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) 2010-07-15 09:16:22,532 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase -- check quorum servers, currently=localhost:2181 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:808) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:432) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.checkOutOfSafeMode(ZooKeeperWrapper.java:545) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:974) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625) at
Re: Hanging regionservers
I grepped yesterday's logs on all servers for Blocking updates and there was no trace. I believe I had encountered the blocking updates problem earlier in the project but throttled down the import speed which seemed to fix that. I just double checked and all three region servers were idle. Something interesting that I noticed however, was that each regionserver had a particular ResponseProcessor thread running for a specific block, and that thread was stuck in a running state during the entirety of the hang. Also a DataStreamer thread for the block associated with the ResponseProcessor was in a wait state. This makes me think that each server was stuck operating on a specific block. ResponseProcessor for block blk_1926230463847049982_2694658 - Thread t...@61160 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked sun.nio.ch.uti...@196fbfd0 - locked java.util.collections$unmodifiable...@7799fdbb - locked sun.nio.ch.epollselectori...@1ee13d55 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399) Locked ownable synchronizers: - None DataStreamer for file /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023 block blk_1926230463847049982_2694658 - Thread t...@61158 java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247) Locked ownable synchronizers: - None On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote: I'm not seeing anything in that logfile, you are seeing compactions for various regions, but im not seeing flushes (typical during insert loads) and nothing else. One thing we look to see is a log message Blocking updates which indicates that a particular region has decided it's holding up to prevent taking too many inserts. Like I said, you could be seeing this on a different regionserver, if all the clients are blocked on 1 regionserver and can't get to the others then most will look idle and only one will actually show anything interesting in the log. Can you check for this behaviour? Also if you want to tweak the config with the values I pasted that should help. On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: It looks like we are going straight from the default config, no expicit setting of anything. On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote: In this case the regionserver isn't actually doing anything - all the IPC thread handlers are waiting in their queue handoff thingy (how they get socket/work to do). Something elsewhere perhaps? Check the logs of your jobs, there might be something interesting there. One thing that frequently happens is you overrun 1 regionserver with edits and it isnt flushing fast enough, so it pauses updates and all clients end up stuck on it. What was that config again? I use these settings: property namehbase.hstore.blockingStoreFiles/name value15/value /property property namedfs.datanode.socket.write.timeout/name value0/value /property property namehbase.hregion.memstore.block.multiplier/name value8/value /property perhaps try these ones? -ryan On Thu, Jul 15, 2010 at 6:55 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: Yea I started with 12 task max per node, dropped it to 8, and now 4, but still having the issue. Here is the spec for each box: 2x CPU (2.5 GHZ nehalem ep quad core) 24 GB RAM (4gb / region server ) 4x 1tb hard drives On 7/15/10 6:32 PM, Ryan Rawson ryano...@gmail.com wrote: Are you sure you dont have any GC issues? Also you are aware that max.tasks.maximum is _per node_ right? How big are these machines? -ryan On Thu, Jul 15, 2010 at 4:30 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Nothing particular in that dump, and I'm not aware of any deadlock in 0.20.3, could we see the region server log? Thx J-D On Thu, Jul 15, 2010 at 4:17 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: First post evar! I have a 3 node
High ingest rate and FIN_WAIT1 problems
Hi, I am a complete HBase and HDFS newbie, so I apologize in advance for the inevitable bloopers. We are doing feasibility testing on NoSql data store options, with rather high ingest rate requirements. So far, HBase is looking good, with only one issue identified. Running at an ingest rate of ~30K rows per second on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets. This is a single node setup - no replication. The CPU load is only about 50%-60%, with the majority of that in userland, system and iowait are averaging less than 3%. There is no swapping going on. The problem is that on the datanode there are a large number of sockets in FIN_WAIT1, with corresponding peers on master in ESTABLISHED. These pairs hang around for quite some time, at at my ingest rate this means that the total sockets held by datanode and master is slowly going up. If my understanding of TCP is correct, then this indicates that the master peer has stopped reading incoming data from the datanode - i.e, it is sending a window of zero; and that the datanode has called close(2) on it's peer. There was a thread some time ago: http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html There was no real conclusion. I have played with the config params as suggested on that thread, but no luck yet. Also, in that case the problem seemed to be between datanodes for replication operations - not the case with me. Changing timeouts to avoid the slow increase might not really solve the problem IFF the master peer has in fact ceased to read it's socket. The data outstanding in the TCP stack buffer would be lost. Whether that would imply data loss is beyond me. I am posting this here as although the only logs with errors/exceptions are the datanode logs, netstat and wireshark seem to indicate that the problem is on the master side. The master, namenode, regionserver and zookeeper and logs shows no warning or errors. The datanode log shows this, over and over: 2010-07-16 00:33:09,269 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075, ipcPort=50020):Got exception while serving blk_3684861726145519813_22386 to /127.0.0.1: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 remote=/127.0.0.1:54774] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2010-07-16 00:33:09,269 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 remote=/127.0.0.1:54774] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) It there is any other info that might help, or any steps you would like me to take, just let me know. Thanks Thomas Downing
Advice for efficiently scanning for modified-since
We have an existing product sitting on the hbase/hadoop ecosystem. We have laid our object model on HBase: we have one table with a row per object, and a separate table with composite index rows. Works great. We can efficiently find our objects based on their type, relationships, etc. by scanning the index table. We *never* scan the main table (except when rebuilding the index). A new requirement just came in: get a list of all objects that have been modified since timestamp. This has to happen quickly (user time). If we scan the main table with a timestamp restriction, will that be efficient? Or do we have to introduce a new composite index that has the last modified timestamp as part of it and scan that? Thanks, Mark
RE: Row level locking?
Thanks for the response. (You don't need to include the cc ...) With respect to the row level locking ... I was interested in when the lock is actually acquired, how long the lock persists and when is the lock released. From your response, the lock is only held on updating the row, and while the data is being written to the memory cache which is then written to disk. (Note: This row level locking different than transactional row level locking.) Now that I've had some caffeine I think I can clarify... :-) Some of my developers complained that they were having trouble with two different processes trying to update the same table. Not sure why they were having the problem, so I wanted to have a good fix. The simple fix was to have them issue the close() the HTable connection which forces any resources that they acquired to be released. In looking at the problem... its possible that they didn't have AutoFlush set to true so the write was still in the buffer and hadn't gotten flushed. If the lock only persists for the duration of the write to memory and is then released, then the issue could have been that the record written was in the buffer and not yet flushed to disk. I'm also assuming that when you run a scan() against a region that any information written to buffer but not yet written to disk will be missed. So I guess the question isn't so much the issue of a lock, but that we need to make sure that data written to the buffer should be flushed ASAP unless we know that we're going to be writing a lot of data in the m/r job. Thx -Mike From: cleh...@adobe.com To: user@hbase.apache.org CC: hbase-u...@hadoop.apache.org Date: Fri, 16 Jul 2010 12:34:36 +0100 Subject: Re: Row level locking? Currently a row is part of a region and there's a single region server serving that region at a particular moment. So when that row is updated a lock is acquired for that row until the actual data is updated in memory (note that a put will be written to cache on the region server and also persisted in the write-ahead log - WAL). Subsequent puts to that row will have to wait for that lock. HBase is fully consistent. This being said all the locking takes place at row level only, so when you scan you have to take that into account as there's no range locking. I'm not sure I understand the resource releasing issue. HTable.close() flushes the current write buffer (you can have write buffer if you use autoFlush set to false). Cosmin On Jul 16, 2010, at 1:33 PM, Michael Segel wrote: Ok, First, I'm writing this before I've had my first cup of coffee so I am apologizing in advance if the question is a brain dead question Going from a relational background, some of these questions may not make sense in the HBase world. When does HBase acquire a lock on a row and how long does it persist? Does the lock only hit the current row, or does it also lock the adjacent rows too? Does HBase support the concept of 'dirty reads'? The issue is what happens when you have two jobs trying to hit the same table at the same time and update/read the rows at the same time. A developer came across a problem and the fix was to use the HTable.close() method to release any resources. I am wondering if you explicitly have to clean up or can a lazy developer let the object just go out of scope and get GC'd. Thx -Mike _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 _ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
RE: High ingest rate and FIN_WAIT1 problems
Hi Thomas, I ran into a very similar issue when running slony-I on postgresql to replicate 15-20 databases. Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or hopefully stop), the leaking sockets. I found some notes about adjusting TCP parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html with the specific excerpt regarding the TCP_FIN_TIMEOUT: TCP_FIN_TIMEOUT This setting determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources. During this TIME_WAIT state, reopening the connection to the client costs less than establishing a new connection. By reducing the value of this entry, TCP/IP can release closed connections faster, making more resources available for new connections. Addjust this in the presense of many connections sitting in the TIME_WAIT state: # echo 30 /proc/sys/net/ipv4/tcp_fin_timeout Try setting this lower on your master. You may also consider these on the same link: TCP_TW_RECYCLE It enables fast recycling of TIME_WAIT sockets. The default value is 0 (disabled). The sysctl documentation incorrectly states the default as enabled. It can be changed to 1 (enabled) in many cases. Known to cause some issues with hoststated (load balancing and fail over) if enabled, should be used with caution. echo 1 /proc/sys/net/ipv4/tcp_tw_recycle TCP_TW_REUSE This allows reusing sockets in TIME_WAIT state for new connections when it is safe from protocol viewpoint. Default value is 0 (disabled). It is generally a safer alternative to tcp_tw_recycle echo 1 /proc/sys/net/ipv4/tcp_tw_reuse The above commands will not persist reboots, but the link explains how. The list experts may be able to give more insight on which, if any, of these settings are safe to manipulate, and what risks or issues you may encounter specifically with Hbase while adjusting these settings. Hope This Helps, Travis Hegner -Original Message- From: Thomas Downing [mailto:tdown...@proteus-technologies.com] Sent: Friday, July 16, 2010 10:33 AM To: user@hbase.apache.org Subject: High ingest rate and FIN_WAIT1 problems Hi, I am a complete HBase and HDFS newbie, so I apologize in advance for the inevitable bloopers. We are doing feasibility testing on NoSql data store options, with rather high ingest rate requirements. So far, HBase is looking good, with only one issue identified. Running at an ingest rate of ~30K rows per second on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets. This is a single node setup - no replication. The CPU load is only about 50%-60%, with the majority of that in userland, system and iowait are averaging less than 3%. There is no swapping going on. The problem is that on the datanode there are a large number of sockets in FIN_WAIT1, with corresponding peers on master in ESTABLISHED. These pairs hang around for quite some time, at at my ingest rate this means that the total sockets held by datanode and master is slowly going up. If my understanding of TCP is correct, then this indicates that the master peer has stopped reading incoming data from the datanode - i.e, it is sending a window of zero; and that the datanode has called close(2) on it's peer. There was a thread some time ago: http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html There was no real conclusion. I have played with the config params as suggested on that thread, but no luck yet. Also, in that case the problem seemed to be between datanodes for replication operations - not the case with me. Changing timeouts to avoid the slow increase might not really solve the problem IFF the master peer has in fact ceased to read it's socket. The data outstanding in the TCP stack buffer would be lost. Whether that would imply data loss is beyond me. I am posting this here as although the only logs with errors/exceptions are the datanode logs, netstat and wireshark seem to indicate that the problem is on the master side. The master, namenode, regionserver and zookeeper and logs shows no warning or errors. The datanode log shows this, over and over: 2010-07-16 00:33:09,269 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075, ipcPort=50020):Got exception while serving blk_3684861726145519813_22386 to /127.0.0.1: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 remote=/127.0.0.1:54774] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at
Re: High ingest rate and FIN_WAIT1 problems
Thanks for the response. My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2, my problem is with FIN_WAIT1. While I do see some sockets in TIME_WAIT, they are only a few, and the number is not growing. On 7/16/2010 12:07 PM, Hegner, Travis wrote: Hi Thomas, I ran into a very similar issue when running slony-I on postgresql to replicate 15-20 databases. Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or hopefully stop), the leaking sockets. I found some notes about adjusting TCP parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html with the specific excerpt regarding the TCP_FIN_TIMEOUT: TCP_FIN_TIMEOUT This setting determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources. During this TIME_WAIT state, reopening the connection to the client costs less than establishing a new connection. By reducing the value of this entry, TCP/IP can release closed connections faster, making more resources available for new connections. Addjust this in the presense of many connections sitting in the TIME_WAIT state: # echo 30 /proc/sys/net/ipv4/tcp_fin_timeout Try setting this lower on your master. You may also consider these on the same link: TCP_TW_RECYCLE It enables fast recycling of TIME_WAIT sockets. The default value is 0 (disabled). The sysctl documentation incorrectly states the default as enabled. It can be changed to 1 (enabled) in many cases. Known to cause some issues with hoststated (load balancing and fail over) if enabled, should be used with caution. echo 1 /proc/sys/net/ipv4/tcp_tw_recycle TCP_TW_REUSE This allows reusing sockets in TIME_WAIT state for new connections when it is safe from protocol viewpoint. Default value is 0 (disabled). It is generally a safer alternative to tcp_tw_recycle echo 1 /proc/sys/net/ipv4/tcp_tw_reuse The above commands will not persist reboots, but the link explains how. The list experts may be able to give more insight on which, if any, of these settings are safe to manipulate, and what risks or issues you may encounter specifically with Hbase while adjusting these settings. Hope This Helps, Travis Hegner -Original Message- From: Thomas Downing [mailto:tdown...@proteus-technologies.com] Sent: Friday, July 16, 2010 10:33 AM To: user@hbase.apache.org Subject: High ingest rate and FIN_WAIT1 problems Hi, I am a complete HBase and HDFS newbie, so I apologize in advance for the inevitable bloopers. We are doing feasibility testing on NoSql data store options, with rather high ingest rate requirements. So far, HBase is looking good, with only one issue identified. Running at an ingest rate of ~30K rows per second on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets. This is a single node setup - no replication. The CPU load is only about 50%-60%, with the majority of that in userland, system and iowait are averaging less than 3%. There is no swapping going on. The problem is that on the datanode there are a large number of sockets in FIN_WAIT1, with corresponding peers on master in ESTABLISHED. These pairs hang around for quite some time, at at my ingest rate this means that the total sockets held by datanode and master is slowly going up. If my understanding of TCP is correct, then this indicates that the master peer has stopped reading incoming data from the datanode - i.e, it is sending a window of zero; and that the datanode has called close(2) on it's peer. There was a thread some time ago: http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html There was no real conclusion. I have played with the config params as suggested on that thread, but no luck yet. Also, in that case the problem seemed to be between datanodes for replication operations - not the case with me. Changing timeouts to avoid the slow increase might not really solve the problem IFF the master peer has in fact ceased to read it's socket. The data outstanding in the TCP stack buffer would be lost. Whether that would imply data loss is beyond me. I am posting this here as although the only logs with errors/exceptions are the datanode logs, netstat and wireshark seem to indicate that the problem is on the master side. The master, namenode, regionserver and zookeeper and logs shows no warning or errors. The datanode log shows this, over and over: 2010-07-16 00:33:09,269 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075, ipcPort=50020):Got exception while serving blk_3684861726145519813_22386 to /127.0.0.1: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 remote=/127.0.0.1:54774] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
Re: Advice for efficiently scanning for modified-since
This request has come up a few times now. We should dev a soln (I've made an issue to track it -- HBASE-2839). You could try scanning whole table but my guess is this will prove too slow -- if not now, then later when your table grows. If results returned will be few, a scan that ran in parallel rather than in series tripping over each table region might make sense. https://issues.apache.org/jira/browse/HBASE-1935 discusses this and even has a patch though I'm sure it well stale at this point. St.Ack On Fri, Jul 16, 2010 at 8:16 AM, Mark Laffoon mlaff...@semanticresearch.com wrote: We have an existing product sitting on the hbase/hadoop ecosystem. We have laid our object model on HBase: we have one table with a row per object, and a separate table with composite index rows. Works great. We can efficiently find our objects based on their type, relationships, etc. by scanning the index table. We *never* scan the main table (except when rebuilding the index). A new requirement just came in: get a list of all objects that have been modified since timestamp. This has to happen quickly (user time). If we scan the main table with a timestamp restriction, will that be efficient? Or do we have to introduce a new composite index that has the last modified timestamp as part of it and scan that? Thanks, Mark
Re: Hanging regionservers
Would you mind pastebinning the stacktrace? It doesn't looks like https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the below, an issue that HADOOP-5859 purportedly fixes -- I see you commented on it -- but our Todd thinks otherwise (He has a 'real' fix up in another issue that I currently can't put my finger on). St.Ack On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand luke.foreh...@networkedinsights.com wrote: I grepped yesterday's logs on all servers for Blocking updates and there was no trace. I believe I had encountered the blocking updates problem earlier in the project but throttled down the import speed which seemed to fix that. I just double checked and all three region servers were idle. Something interesting that I noticed however, was that each regionserver had a particular ResponseProcessor thread running for a specific block, and that thread was stuck in a running state during the entirety of the hang. Also a DataStreamer thread for the block associated with the ResponseProcessor was in a wait state. This makes me think that each server was stuck operating on a specific block. ResponseProcessor for block blk_1926230463847049982_2694658 - Thread t...@61160 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked sun.nio.ch.uti...@196fbfd0 - locked java.util.collections$unmodifiable...@7799fdbb - locked sun.nio.ch.epollselectori...@1ee13d55 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399) Locked ownable synchronizers: - None DataStreamer for file /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023 block blk_1926230463847049982_2694658 - Thread t...@61158 java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247) Locked ownable synchronizers: - None On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote: I'm not seeing anything in that logfile, you are seeing compactions for various regions, but im not seeing flushes (typical during insert loads) and nothing else. One thing we look to see is a log message Blocking updates which indicates that a particular region has decided it's holding up to prevent taking too many inserts. Like I said, you could be seeing this on a different regionserver, if all the clients are blocked on 1 regionserver and can't get to the others then most will look idle and only one will actually show anything interesting in the log. Can you check for this behaviour? Also if you want to tweak the config with the values I pasted that should help. On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: It looks like we are going straight from the default config, no expicit setting of anything. On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote: In this case the regionserver isn't actually doing anything - all the IPC thread handlers are waiting in their queue handoff thingy (how they get socket/work to do). Something elsewhere perhaps? Check the logs of your jobs, there might be something interesting there. One thing that frequently happens is you overrun 1 regionserver with edits and it isnt flushing fast enough, so it pauses updates and all clients end up stuck on it. What was that config again? I use these settings: property namehbase.hstore.blockingStoreFiles/name value15/value /property property namedfs.datanode.socket.write.timeout/name value0/value /property property namehbase.hregion.memstore.block.multiplier/name value8/value /property perhaps try these ones? -ryan On Thu, Jul 15, 2010 at 6:55 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: Yea I started with 12 task max per node, dropped it to 8, and now 4, but still having the issue. Here is the spec for each box: 2x CPU (2.5 GHZ nehalem ep quad core) 24 GB RAM (4gb / region server ) 4x 1tb hard drives On 7/15/10 6:32 PM, Ryan Rawson ryano...@gmail.com wrote: Are you sure you
Re: Hanging regionservers
Each time you threaddump, its stuck in same way? I've not seen this dfsclient hangup before, not that I remember. Let me ask some hdfs-heads. Will be back to you. Any chance of your upping to CHD3b2, for your hadoop at least? HDFS has a few dfsclient/ipc fixes -- though looking at them none seem to explicitly address your issue. Whats that jvm that you are running? Can you do a java -version? Whats your OS? Thanks, St.Ack On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand luke.foreh...@networkedinsights.com wrote: Line 58 and line 79 are the threads that I found suspicious. http://pastebin.com/W1E2nCZq The other stack traces from the other two region servers look identical to this one. BTW - I have made the config changes per Ryan Rawson's suggestion (thanks!) and I've processed ~7 GB of the 15 GB without hangup thus far so I'm crossing my fingers. -Luke On 7/16/10 11:48 AM, Stack st...@duboce.net wrote: Would you mind pastebinning the stacktrace? It doesn't looks like https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the below, an issue that HADOOP-5859 purportedly fixes -- I see you commented on it -- but our Todd thinks otherwise (He has a 'real' fix up in another issue that I currently can't put my finger on). St.Ack On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand luke.foreh...@networkedinsights.com wrote: I grepped yesterday's logs on all servers for Blocking updates and there was no trace. I believe I had encountered the blocking updates problem earlier in the project but throttled down the import speed which seemed to fix that. I just double checked and all three region servers were idle. Something interesting that I noticed however, was that each regionserver had a particular ResponseProcessor thread running for a specific block, and that thread was stuck in a running state during the entirety of the hang. Also a DataStreamer thread for the block associated with the ResponseProcessor was in a wait state. This makes me think that each server was stuck operating on a specific block. ResponseProcessor for block blk_1926230463847049982_2694658 - Thread t...@61160 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked sun.nio.ch.uti...@196fbfd0 - locked java.util.collections$unmodifiable...@7799fdbb - locked sun.nio.ch.epollselectori...@1ee13d55 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399) Locked ownable synchronizers: - None DataStreamer for file /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023 block blk_1926230463847049982_2694658 - Thread t...@61158 java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247) Locked ownable synchronizers: - None
YouAreDeadException with hbase
Hi, There: I got some YouAreDeadException with hbase. what can cause it ? I do notice between 5:49 to 5:53 , for 4 minutes, there is no log. This doesn't look like GC issue as I checked the GC log, the longest GC is only 9.6 seconds. Jimmy. 2010-07-16 05:49:26,805 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss Ratio=99.95 254278182983%, Evicted/Run=NaN 2010-07-16 05:53:23,476 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss Ratio=99.95 250105857849%, Evicted/Run=NaN 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240540ms for sessionid 0x329c88039b0006c, closing socket connection and attempting reconnect 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket con nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating sessio n 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnec t to ZooKeeper service, session 0x329c88039b0006c has expired, closing socket co nnection 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240520ms for sessionid 0x129c87a7f98007a, closing socket connection and attempting reconnect 2010-07-16 05:53:39,090 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer : Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, load =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadE xception: Server REPORT rejected; currently processing m0002028.ppops.net,60020, 1279237223465 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage r.java:217) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve rManager.java:271) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav a:684) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: 919) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
YouAreDeadException with hbase
Hi, There: I got some YouAreDeadException with hbase. what can cause it ? I do notice between 5:49 to 5:53 , for 4 minutes, there is no log. This doesn't look like GC issue as I checked the GC log, the longest GC is only 9.6 seconds. Jimmy. 2010-07-16 05:49:26,805 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss Ratio=99.95 254278182983%, Evicted/Run=NaN 2010-07-16 05:53:23,476 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss Ratio=99.95 250105857849%, Evicted/Run=NaN 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240540ms for sessionid 0x329c88039b0006c, closing socket connection and attempting reconnect 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket con nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating sessio n 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnec t to ZooKeeper service, session 0x329c88039b0006c has expired, closing socket co nnection 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240520ms for sessionid 0x129c87a7f98007a, closing socket connection and attempting reconnect 2010-07-16 05:53:39,090 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer : Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, load =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadE xception: Server REPORT rejected; currently processing m0002028.ppops.net,60020, 1279237223465 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage r.java:217) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve rManager.java:271) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav a:684) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: 919) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Re: Hanging regionservers
I was about to migrate to CHD3b2 but thought I would wait for a few replies before doing so. I'll likely have the migration done over the weekend. Java: java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) OS: CentOS release 5.5 (Final) 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Thanks Luke On 7/16/10 12:45 PM, Stack st...@duboce.net wrote: Yeah, go to cdh3b2 if you can. If you can repro there, there's a few fellas (other than us hbasers) who'd be real interested in your problem. St.Ack On Fri, Jul 16, 2010 at 10:40 AM, Stack st...@duboce.net wrote: Each time you threaddump, its stuck in same way? I've not seen this dfsclient hangup before, not that I remember. Let me ask some hdfs-heads. Will be back to you. Any chance of your upping to CHD3b2, for your hadoop at least? HDFS has a few dfsclient/ipc fixes -- though looking at them none seem to explicitly address your issue. Whats that jvm that you are running? Can you do a java -version? Whats your OS? Thanks, St.Ack On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand luke.foreh...@networkedinsights.com wrote: Line 58 and line 79 are the threads that I found suspicious. http://pastebin.com/W1E2nCZq The other stack traces from the other two region servers look identical to this one. BTW - I have made the config changes per Ryan Rawson's suggestion (thanks!) and I've processed ~7 GB of the 15 GB without hangup thus far so I'm crossing my fingers. -Luke On 7/16/10 11:48 AM, Stack st...@duboce.net wrote: Would you mind pastebinning the stacktrace? It doesn't looks like https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the below, an issue that HADOOP-5859 purportedly fixes -- I see you commented on it -- but our Todd thinks otherwise (He has a 'real' fix up in another issue that I currently can't put my finger on). St.Ack On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand luke.foreh...@networkedinsights.com wrote: I grepped yesterday's logs on all servers for Blocking updates and there was no trace. I believe I had encountered the blocking updates problem earlier in the project but throttled down the import speed which seemed to fix that. I just double checked and all three region servers were idle. Something interesting that I noticed however, was that each regionserver had a particular ResponseProcessor thread running for a specific block, and that thread was stuck in a running state during the entirety of the hang. Also a DataStreamer thread for the block associated with the ResponseProcessor was in a wait state. This makes me think that each server was stuck operating on a specific block. ResponseProcessor for block blk_1926230463847049982_2694658 - Thread t...@61160 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked sun.nio.ch.uti...@196fbfd0 - locked java.util.collections$unmodifiable...@7799fdbb - locked sun.nio.ch.epollselectori...@1ee13d55 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399) Locked ownable synchronizers: - None DataStreamer for file /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023 block blk_1926230463847049982_2694658 - Thread t...@61158 java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247) Locked ownable synchronizers: - None
Re: YouAreDeadException with hbase
You'll see this if the server reports to the master after the master has ruled it 'dead'. Here is the code that produces the exception: if (!isDead(serverName)) return; String message = Server + what + rejected; currently processing + serverName + as dead server; LOG.debug(message); throw new YouAreDeadException(message); Servers are on the 'dead' list if zk reports their session has expired. The master moves then to cleanup after the dead server and process its logs. If during this cleanup time the server reports in, master will return the youaredead exception. Usually the RS has lost its zk session but has yet to realize it. St.Ack On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com wrote: Hi, There: I got some YouAreDeadException with hbase. what can cause it ? I do notice between 5:49 to 5:53 , for 4 minutes, there is no log. This doesn't look like GC issue as I checked the GC log, the longest GC is only 9.6 seconds. Jimmy. 2010-07-16 05:49:26,805 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss Ratio=99.95 254278182983%, Evicted/Run=NaN 2010-07-16 05:53:23,476 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss Ratio=99.95 250105857849%, Evicted/Run=NaN 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240540ms for sessionid 0x329c88039b0006c, closing socket connection and attempting reconnect 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket con nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating sessio n 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnec t to ZooKeeper service, session 0x329c88039b0006c has expired, closing socket co nnection 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240520ms for sessionid 0x129c87a7f98007a, closing socket connection and attempting reconnect 2010-07-16 05:53:39,090 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer : Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, load =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadE xception: Server REPORT rejected; currently processing m0002028.ppops.net,60020, 1279237223465 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage r.java:217) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve rManager.java:271) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav a:684) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: 919) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Re: YouAreDeadException with hbase
Yes, the root cause seems to be the gap of 4 minutes between 2010-07-16 05:49:26,805 and 2010-07-16 05:53:23,476 . but I checked GC gc-hbase.log and don't see 4 minute gap in gc. I just wonder what could cause this large gap. I also wonder if there is a configuration that I can do to avoid this long pause, or get around the problem cause by this long pause. Jimmy -- From: Stack st...@duboce.net Sent: Friday, July 16, 2010 11:44 AM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase You'll see this if the server reports to the master after the master has ruled it 'dead'. Here is the code that produces the exception: if (!isDead(serverName)) return; String message = Server + what + rejected; currently processing + serverName + as dead server; LOG.debug(message); throw new YouAreDeadException(message); Servers are on the 'dead' list if zk reports their session has expired. The master moves then to cleanup after the dead server and process its logs. If during this cleanup time the server reports in, master will return the youaredead exception. Usually the RS has lost its zk session but has yet to realize it. St.Ack On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com wrote: Hi, There: I got some YouAreDeadException with hbase. what can cause it ? I do notice between 5:49 to 5:53 , for 4 minutes, there is no log. This doesn't look like GC issue as I checked the GC log, the longest GC is only 9.6 seconds. Jimmy. 2010-07-16 05:49:26,805 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss Ratio=99.95 254278182983%, Evicted/Run=NaN 2010-07-16 05:53:23,476 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss Ratio=99.95 250105857849%, Evicted/Run=NaN 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240540ms for sessionid 0x329c88039b0006c, closing socket connection and attempting reconnect 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket con nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating sessio n 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnec t to ZooKeeper service, session 0x329c88039b0006c has expired, closing socket co nnection 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240520ms for sessionid 0x129c87a7f98007a, closing socket connection and attempting reconnect 2010-07-16 05:53:39,090 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer : Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, load =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadE xception: Server REPORT rejected; currently processing m0002028.ppops.net,60020, 1279237223465 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage r.java:217) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve rManager.java:271) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav a:684) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: 919) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Re: High ingest rate and FIN_WAIT1 problems
I've been running with this setting on both the HDFS side and the HBase side for over a year now, it's a bit of voodoo but you might be running into well known suckage of HDFS. Try this one and restart your hbase hdfs. The FIN_WAIT2/TIME_WAIT happens more on large concurrent gets, not so much for inserts. property namedfs.datanode.socket.write.timeout/name value0/value /property -ryan On Fri, Jul 16, 2010 at 9:33 AM, Thomas Downing tdown...@proteus-technologies.com wrote: Thanks for the response. My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2, my problem is with FIN_WAIT1. While I do see some sockets in TIME_WAIT, they are only a few, and the number is not growing. On 7/16/2010 12:07 PM, Hegner, Travis wrote: Hi Thomas, I ran into a very similar issue when running slony-I on postgresql to replicate 15-20 databases. Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or hopefully stop), the leaking sockets. I found some notes about adjusting TCP parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html with the specific excerpt regarding the TCP_FIN_TIMEOUT: TCP_FIN_TIMEOUT This setting determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources. During this TIME_WAIT state, reopening the connection to the client costs less than establishing a new connection. By reducing the value of this entry, TCP/IP can release closed connections faster, making more resources available for new connections. Addjust this in the presense of many connections sitting in the TIME_WAIT state: # echo 30 /proc/sys/net/ipv4/tcp_fin_timeout Try setting this lower on your master. You may also consider these on the same link: TCP_TW_RECYCLE It enables fast recycling of TIME_WAIT sockets. The default value is 0 (disabled). The sysctl documentation incorrectly states the default as enabled. It can be changed to 1 (enabled) in many cases. Known to cause some issues with hoststated (load balancing and fail over) if enabled, should be used with caution. echo 1 /proc/sys/net/ipv4/tcp_tw_recycle TCP_TW_REUSE This allows reusing sockets in TIME_WAIT state for new connections when it is safe from protocol viewpoint. Default value is 0 (disabled). It is generally a safer alternative to tcp_tw_recycle echo 1 /proc/sys/net/ipv4/tcp_tw_reuse The above commands will not persist reboots, but the link explains how. The list experts may be able to give more insight on which, if any, of these settings are safe to manipulate, and what risks or issues you may encounter specifically with Hbase while adjusting these settings. Hope This Helps, Travis Hegner -Original Message- From: Thomas Downing [mailto:tdown...@proteus-technologies.com] Sent: Friday, July 16, 2010 10:33 AM To: user@hbase.apache.org Subject: High ingest rate and FIN_WAIT1 problems Hi, I am a complete HBase and HDFS newbie, so I apologize in advance for the inevitable bloopers. We are doing feasibility testing on NoSql data store options, with rather high ingest rate requirements. So far, HBase is looking good, with only one issue identified. Running at an ingest rate of ~30K rows per second on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets. This is a single node setup - no replication. The CPU load is only about 50%-60%, with the majority of that in userland, system and iowait are averaging less than 3%. There is no swapping going on. The problem is that on the datanode there are a large number of sockets in FIN_WAIT1, with corresponding peers on master in ESTABLISHED. These pairs hang around for quite some time, at at my ingest rate this means that the total sockets held by datanode and master is slowly going up. If my understanding of TCP is correct, then this indicates that the master peer has stopped reading incoming data from the datanode - i.e, it is sending a window of zero; and that the datanode has called close(2) on it's peer. There was a thread some time ago: http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html There was no real conclusion. I have played with the config params as suggested on that thread, but no luck yet. Also, in that case the problem seemed to be between datanodes for replication operations - not the case with me. Changing timeouts to avoid the slow increase might not really solve the problem IFF the master peer has in fact ceased to read it's socket. The data outstanding in the TCP stack buffer would be lost. Whether that would imply data loss is beyond me. I am posting this here as although the only logs with errors/exceptions are the datanode logs, netstat and wireshark seem to indicate that the problem is on the master side. The master, namenode, regionserver and zookeeper and logs shows no warning or errors. The datanode log shows this, over and over: 2010-07-16 00:33:09,269 WARN
Re: YouAreDeadException with hbase
Sometimes the GC can chain multiple medium pauses into one large pause. I've seen this before where there are 2 long pauses back to back and the result was a 50second+ pause. This article talks a lot about GC performance and tuning, check it out: http://wiki.apache.org/hadoop/PerformanceTuning -ryan On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, the root cause seems to be the gap of 4 minutes between 2010-07-16 05:49:26,805 and 2010-07-16 05:53:23,476 . but I checked GC gc-hbase.log and don't see 4 minute gap in gc. I just wonder what could cause this large gap. I also wonder if there is a configuration that I can do to avoid this long pause, or get around the problem cause by this long pause. Jimmy -- From: Stack st...@duboce.net Sent: Friday, July 16, 2010 11:44 AM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase You'll see this if the server reports to the master after the master has ruled it 'dead'. Here is the code that produces the exception: if (!isDead(serverName)) return; String message = Server + what + rejected; currently processing + serverName + as dead server; LOG.debug(message); throw new YouAreDeadException(message); Servers are on the 'dead' list if zk reports their session has expired. The master moves then to cleanup after the dead server and process its logs. If during this cleanup time the server reports in, master will return the youaredead exception. Usually the RS has lost its zk session but has yet to realize it. St.Ack On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com wrote: Hi, There: I got some YouAreDeadException with hbase. what can cause it ? I do notice between 5:49 to 5:53 , for 4 minutes, there is no log. This doesn't look like GC issue as I checked the GC log, the longest GC is only 9.6 seconds. Jimmy. 2010-07-16 05:49:26,805 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss Ratio=99.95 254278182983%, Evicted/Run=NaN 2010-07-16 05:53:23,476 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss Ratio=99.95 250105857849%, Evicted/Run=NaN 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240540ms for sessionid 0x329c88039b0006c, closing socket connection and attempting reconnect 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket con nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating sessio n 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnec t to ZooKeeper service, session 0x329c88039b0006c has expired, closing socket co nnection 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240520ms for sessionid 0x129c87a7f98007a, closing socket connection and attempting reconnect 2010-07-16 05:53:39,090 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer : Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, load =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadE xception: Server REPORT rejected; currently processing m0002028.ppops.net,60020, 1279237223465 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage r.java:217) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve rManager.java:271) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav a:684) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: 919) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Re: Row level locking?
HTable.close does very little: public void close() throws IOException{ flushCommits(); } None of which involves row locks. One thing to watch out for is to remember to close your scanners - they continue to use server-side resources until you close them or 60 seconds passes and they get timed out. Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone. There was a proposal to remove them, but I don't think that went through. On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote: On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: Thanks for the response. (You don't need to include the cc ...) With respect to the row level locking ... I was interested in when the lock is actually acquired, how long the lock persists and when is the lock released. From your response, the lock is only held on updating the row, and while the data is being written to the memory cache which is then written to disk. (Note: This row level locking different than transactional row level locking.) Now that I've had some caffeine I think I can clarify... :-) Some of my developers complained that they were having trouble with two different processes trying to update the same table. Not sure why they were having the problem, so I wanted to have a good fix. The simple fix was to have them issue the close() the HTable connection which forces any resources that they acquired to be released. It would help to know what the exact problem was. Normally I wouldn't see any problems. In looking at the problem... its possible that they didn't have AutoFlush set to true so the write was still in the buffer and hadn't gotten flushed. If the lock only persists for the duration of the write to memory and is then released, then the issue could have been that the record written was in the buffer and not yet flushed to disk. At the region server level HBase will use the cache for both reads and writes. This happens transparently for the user. Once something is written in the cache, all other clients will read from the same cache. No need to worry if the cache has been flushed. Lars George has a good article about the hbase storage architecture http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html I'm also assuming that when you run a scan() against a region that any information written to buffer but not yet written to disk will be missed. When you do puts into hbase you'll use HTable. The HTable instance is on the client. HTable keeps a buffer as well and if autoFlush is false it only flushes when you do flushCommits() or when it reaches the buffer limit, or when you close the table. With autoFlush set to true it will flush for every put. This buffer is on the client. So when data is actually flushed it gets on the region server where it will get in the region server cache and WAL. Unless a client flushes the put no other client can see the data because it still resides on the client only. Depending on what you need to do you can use autoFlush true if you are doing many small writes that need to be seen immediately by others. You can use autoFlush false and issue flushCommits() yourself, or you can rely on the buffer limit for that. So I guess the question isn't so much the issue of a lock, but that we need to make sure that data written to the buffer should be flushed ASAP unless we know that we're going to be writing a lot of data in the m/r job. Usually when you write from the reducer (heavy) is better to use a buffer and not autoFlush to have a good performance. Cosmin Thx -Mike From: cleh...@adobe.commailto:cleh...@adobe.com To: user@hbase.apache.orgmailto:user@hbase.apache.org CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org Date: Fri, 16 Jul 2010 12:34:36 +0100 Subject: Re: Row level locking? Currently a row is part of a region and there's a single region server serving that region at a particular moment. So when that row is updated a lock is acquired for that row until the actual data is updated in memory (note that a put will be written to cache on the region server and also persisted in the write-ahead log - WAL). Subsequent puts to that row will have to wait for that lock. HBase is fully consistent. This being said all the locking takes place at row level only, so when you scan you have to take that into account as there's no range locking. I'm not sure I understand the resource releasing issue. HTable.close() flushes the current write buffer (you can have write buffer if you use autoFlush set to false). Cosmin On Jul 16, 2010, at 1:33 PM, Michael Segel wrote: Ok, First, I'm writing this before I've had my first cup of coffee so I am apologizing in advance if the question is a brain dead question Going from a relational background, some of these questions may not
Re: Row level locking?
What about implementing explicit row locks using the zookeeper? I'm planning to do this sometime in the near future. Does anyone have any comments against this approach? (or maybe it was already implemented by someone :-) On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote: HTable.close does very little: public void close() throws IOException{ flushCommits(); } None of which involves row locks. One thing to watch out for is to remember to close your scanners - they continue to use server-side resources until you close them or 60 seconds passes and they get timed out. Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone. There was a proposal to remove them, but I don't think that went through. On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote: On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: Thanks for the response. (You don't need to include the cc ...) With respect to the row level locking ... I was interested in when the lock is actually acquired, how long the lock persists and when is the lock released. From your response, the lock is only held on updating the row, and while the data is being written to the memory cache which is then written to disk. (Note: This row level locking different than transactional row level locking.) Now that I've had some caffeine I think I can clarify... :-) Some of my developers complained that they were having trouble with two different processes trying to update the same table. Not sure why they were having the problem, so I wanted to have a good fix. The simple fix was to have them issue the close() the HTable connection which forces any resources that they acquired to be released. It would help to know what the exact problem was. Normally I wouldn't see any problems. In looking at the problem... its possible that they didn't have AutoFlush set to true so the write was still in the buffer and hadn't gotten flushed. If the lock only persists for the duration of the write to memory and is then released, then the issue could have been that the record written was in the buffer and not yet flushed to disk. At the region server level HBase will use the cache for both reads and writes. This happens transparently for the user. Once something is written in the cache, all other clients will read from the same cache. No need to worry if the cache has been flushed. Lars George has a good article about the hbase storage architecture http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html I'm also assuming that when you run a scan() against a region that any information written to buffer but not yet written to disk will be missed. When you do puts into hbase you'll use HTable. The HTable instance is on the client. HTable keeps a buffer as well and if autoFlush is false it only flushes when you do flushCommits() or when it reaches the buffer limit, or when you close the table. With autoFlush set to true it will flush for every put. This buffer is on the client. So when data is actually flushed it gets on the region server where it will get in the region server cache and WAL. Unless a client flushes the put no other client can see the data because it still resides on the client only. Depending on what you need to do you can use autoFlush true if you are doing many small writes that need to be seen immediately by others. You can use autoFlush false and issue flushCommits() yourself, or you can rely on the buffer limit for that. So I guess the question isn't so much the issue of a lock, but that we need to make sure that data written to the buffer should be flushed ASAP unless we know that we're going to be writing a lot of data in the m/r job. Usually when you write from the reducer (heavy) is better to use a buffer and not autoFlush to have a good performance. Cosmin Thx -Mike From: cleh...@adobe.commailto:cleh...@adobe.com To: user@hbase.apache.orgmailto:user@hbase.apache.org CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org Date: Fri, 16 Jul 2010 12:34:36 +0100 Subject: Re: Row level locking? Currently a row is part of a region and there's a single region server serving that region at a particular moment. So when that row is updated a lock is acquired for that row until the actual data is updated in memory (note that a put will be written to cache on the region server and also persisted in the write-ahead log - WAL). Subsequent puts to that row will have to wait for that lock. HBase is fully consistent. This being said all the locking takes place at row level only, so when you scan you have to take that into account as there's no range locking. I'm not sure I understand the resource releasing issue. HTable.close() flushes the current write buffer (you can have
Re: Row level locking?
Explicit locks with zookeeper would be (a) slow and (b) completely out of band and ultimately up to you. I wouldn't exactly be eager to do our row locking in zookeeper (since the minimum operation time is between 2-10ms). You could do application advisory locks, but that is true no matter what datastore you use... On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio germog...@gmail.com wrote: What about implementing explicit row locks using the zookeeper? I'm planning to do this sometime in the near future. Does anyone have any comments against this approach? (or maybe it was already implemented by someone :-) On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote: HTable.close does very little: public void close() throws IOException{ flushCommits(); } None of which involves row locks. One thing to watch out for is to remember to close your scanners - they continue to use server-side resources until you close them or 60 seconds passes and they get timed out. Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone. There was a proposal to remove them, but I don't think that went through. On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote: On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: Thanks for the response. (You don't need to include the cc ...) With respect to the row level locking ... I was interested in when the lock is actually acquired, how long the lock persists and when is the lock released. From your response, the lock is only held on updating the row, and while the data is being written to the memory cache which is then written to disk. (Note: This row level locking different than transactional row level locking.) Now that I've had some caffeine I think I can clarify... :-) Some of my developers complained that they were having trouble with two different processes trying to update the same table. Not sure why they were having the problem, so I wanted to have a good fix. The simple fix was to have them issue the close() the HTable connection which forces any resources that they acquired to be released. It would help to know what the exact problem was. Normally I wouldn't see any problems. In looking at the problem... its possible that they didn't have AutoFlush set to true so the write was still in the buffer and hadn't gotten flushed. If the lock only persists for the duration of the write to memory and is then released, then the issue could have been that the record written was in the buffer and not yet flushed to disk. At the region server level HBase will use the cache for both reads and writes. This happens transparently for the user. Once something is written in the cache, all other clients will read from the same cache. No need to worry if the cache has been flushed. Lars George has a good article about the hbase storage architecture http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html I'm also assuming that when you run a scan() against a region that any information written to buffer but not yet written to disk will be missed. When you do puts into hbase you'll use HTable. The HTable instance is on the client. HTable keeps a buffer as well and if autoFlush is false it only flushes when you do flushCommits() or when it reaches the buffer limit, or when you close the table. With autoFlush set to true it will flush for every put. This buffer is on the client. So when data is actually flushed it gets on the region server where it will get in the region server cache and WAL. Unless a client flushes the put no other client can see the data because it still resides on the client only. Depending on what you need to do you can use autoFlush true if you are doing many small writes that need to be seen immediately by others. You can use autoFlush false and issue flushCommits() yourself, or you can rely on the buffer limit for that. So I guess the question isn't so much the issue of a lock, but that we need to make sure that data written to the buffer should be flushed ASAP unless we know that we're going to be writing a lot of data in the m/r job. Usually when you write from the reducer (heavy) is better to use a buffer and not autoFlush to have a good performance. Cosmin Thx -Mike From: cleh...@adobe.commailto:cleh...@adobe.com To: user@hbase.apache.orgmailto:user@hbase.apache.org CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org Date: Fri, 16 Jul 2010 12:34:36 +0100 Subject: Re: Row level locking? Currently a row is part of a region and there's a single region server serving that region at a particular moment. So when that row is updated a lock is acquired for that row until the actual data is updated in memory (note that a put will be written to cache on the region server and also
Re: YouAreDeadException with hbase
These 2 lines are different GC collections: 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs] 5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] It's a little hard to read that, it looks like the CMS concurrent mark took 310 seconds then failed, then we got a 281 second real time pause, but interestingly enough the user and system time is fairly low. How loaded are these machines? You need to be giving enough uncontended CPU time to hbase. On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com wrote: I already implemented all these configs before my test. I checked gc-hbase.log, I see a GC failure which looks very suspecious: 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs] 2016283K -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08 sys=0.00, r eal=0.83 secs] 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [ Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K), 0.05792 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs] the concurrent mode failure, and 281.05 seconds of GC time, looks like the culprit for the problem. I just wonder how to resolve this issue. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 12:57 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase Sometimes the GC can chain multiple medium pauses into one large pause. I've seen this before where there are 2 long pauses back to back and the result was a 50second+ pause. This article talks a lot about GC performance and tuning, check it out: http://wiki.apache.org/hadoop/PerformanceTuning -ryan On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, the root cause seems to be the gap of 4 minutes between 2010-07-16 05:49:26,805 and 2010-07-16 05:53:23,476 . but I checked GC gc-hbase.log and don't see 4 minute gap in gc. I just wonder what could cause this large gap. I also wonder if there is a configuration that I can do to avoid this long pause, or get around the problem cause by this long pause. Jimmy -- From: Stack st...@duboce.net Sent: Friday, July 16, 2010 11:44 AM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase You'll see this if the server reports to the master after the master has ruled it 'dead'. Here is the code that produces the exception: if (!isDead(serverName)) return; String message = Server + what + rejected; currently processing + serverName + as dead server; LOG.debug(message); throw new YouAreDeadException(message); Servers are on the 'dead' list if zk reports their session has expired. The master moves then to cleanup after the dead server and process its logs. If during this cleanup time the server reports in, master will return the youaredead exception. Usually the RS has lost its zk session but has yet to realize it. St.Ack On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com wrote: Hi, There: I got some YouAreDeadException with hbase. what can cause it ? I do notice between 5:49 to 5:53 , for 4 minutes, there is no log. This doesn't look like GC issue as I checked the GC log, the longest GC is only 9.6 seconds. Jimmy. 2010-07-16 05:49:26,805 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss Ratio=99.95 254278182983%, Evicted/Run=NaN 2010-07-16 05:53:23,476 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), Max=4 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, Miss=2177880, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss Ratio=99.95 250105857849%, Evicted/Run=NaN 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session tim ed out, have not heard from server in 240540ms for sessionid 0x329c88039b0006c, closing socket connection and
Re: YouAreDeadException with hbase
One bit of advice that is easy to overlook but is important is thus: This can be due to the size of the ParNew, which should be relatively small. If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew. Add this to HBASE_OPTS: export HBASE_OPTS=-XX:NewSize=6m -XX:MaxNewSize=6m cms options from above gc logging options from above From our wiki about performance tuning. On Fri, Jul 16, 2010 at 1:50 PM, Jinsong Hu jinsong...@hotmail.com wrote: I was doing stress testing, so the load is not small. But I purposely limited the data rate on client side so load is not big either. using iostat -x 5 and I can see there are lots of situations that the CPU goes to very high level and stay there for long time. but then it ultimately go down. It is highly possible that during a certain period the CPU was too busy and the GC process was starved for CPU. I researched this failure and found an excellent thread talking about GC: http://forums.sun.com/thread.jspa?threadID=698490 that is more detailed than http://wiki.apache.org/hadoop/PerformanceTuning . I will do some tunning and see if it helps following the posted config there. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 1:35 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase These 2 lines are different GC collections: 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs] 5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] It's a little hard to read that, it looks like the CMS concurrent mark took 310 seconds then failed, then we got a 281 second real time pause, but interestingly enough the user and system time is fairly low. How loaded are these machines? You need to be giving enough uncontended CPU time to hbase. On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com wrote: I already implemented all these configs before my test. I checked gc-hbase.log, I see a GC failure which looks very suspecious: 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs] 2016283K -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08 sys=0.00, r eal=0.83 secs] 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [ Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K), 0.05792 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs] the concurrent mode failure, and 281.05 seconds of GC time, looks like the culprit for the problem. I just wonder how to resolve this issue. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 12:57 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase Sometimes the GC can chain multiple medium pauses into one large pause. I've seen this before where there are 2 long pauses back to back and the result was a 50second+ pause. This article talks a lot about GC performance and tuning, check it out: http://wiki.apache.org/hadoop/PerformanceTuning -ryan On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, the root cause seems to be the gap of 4 minutes between 2010-07-16 05:49:26,805 and 2010-07-16 05:53:23,476 . but I checked GC gc-hbase.log and don't see 4 minute gap in gc. I just wonder what could cause this large gap. I also wonder if there is a configuration that I can do to avoid this long pause, or get around the problem cause by this long pause. Jimmy -- From: Stack st...@duboce.net Sent: Friday, July 16, 2010 11:44 AM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase You'll see this if the server reports to the master after the master has ruled it 'dead'. Here is the code that produces the exception: if (!isDead(serverName)) return; String message = Server + what + rejected; currently processing + serverName + as dead server; LOG.debug(message); throw new YouAreDeadException(message); Servers are on the 'dead' list if zk reports their session has expired. The master moves then to cleanup
Re: Row level locking?
thanks Ryan! (I was about to look for performance numbers) Just another question -- slightly related to locks. Will HBase 0.90 include HTable.checkAndPut receiving more than one value to check? I'm eager to help, if possible. On Fri, Jul 16, 2010 at 5:58 PM, Guilherme Germoglio germog...@gmail.com wrote: thanks Ryan! (I was about to look for performance numbers) Just another question -- slightly related to locks. Will HBase 0.90 include HTable.checkAndPut receiving more than one value to check? I'm eager to help, if possible. On Fri, Jul 16, 2010 at 5:24 PM, Ryan Rawson ryano...@gmail.com wrote: Explicit locks with zookeeper would be (a) slow and (b) completely out of band and ultimately up to you. I wouldn't exactly be eager to do our row locking in zookeeper (since the minimum operation time is between 2-10ms). You could do application advisory locks, but that is true no matter what datastore you use... On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio germog...@gmail.com wrote: What about implementing explicit row locks using the zookeeper? I'm planning to do this sometime in the near future. Does anyone have any comments against this approach? (or maybe it was already implemented by someone :-) On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote: HTable.close does very little: public void close() throws IOException{ flushCommits(); } None of which involves row locks. One thing to watch out for is to remember to close your scanners - they continue to use server-side resources until you close them or 60 seconds passes and they get timed out. Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone. There was a proposal to remove them, but I don't think that went through. On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote: On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: Thanks for the response. (You don't need to include the cc ...) With respect to the row level locking ... I was interested in when the lock is actually acquired, how long the lock persists and when is the lock released. From your response, the lock is only held on updating the row, and while the data is being written to the memory cache which is then written to disk. (Note: This row level locking different than transactional row level locking.) Now that I've had some caffeine I think I can clarify... :-) Some of my developers complained that they were having trouble with two different processes trying to update the same table. Not sure why they were having the problem, so I wanted to have a good fix. The simple fix was to have them issue the close() the HTable connection which forces any resources that they acquired to be released. It would help to know what the exact problem was. Normally I wouldn't see any problems. In looking at the problem... its possible that they didn't have AutoFlush set to true so the write was still in the buffer and hadn't gotten flushed. If the lock only persists for the duration of the write to memory and is then released, then the issue could have been that the record written was in the buffer and not yet flushed to disk. At the region server level HBase will use the cache for both reads and writes. This happens transparently for the user. Once something is written in the cache, all other clients will read from the same cache. No need to worry if the cache has been flushed. Lars George has a good article about the hbase storage architecture http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html I'm also assuming that when you run a scan() against a region that any information written to buffer but not yet written to disk will be missed. When you do puts into hbase you'll use HTable. The HTable instance is on the client. HTable keeps a buffer as well and if autoFlush is false it only flushes when you do flushCommits() or when it reaches the buffer limit, or when you close the table. With autoFlush set to true it will flush for every put. This buffer is on the client. So when data is actually flushed it gets on the region server where it will get in the region server cache and WAL. Unless a client flushes the put no other client can see the data because it still resides on the client only. Depending on what you need to do you can use autoFlush true if you are doing many small writes that need to be seen immediately by others. You can use autoFlush false and issue flushCommits() yourself, or you can rely on the buffer limit for that. So I guess the question isn't so much the issue of a lock, but that we need to make sure that data written to the buffer should be flushed ASAP unless we know that we're going to be
Re: YouAreDeadException with hbase
Yes, I noticed that too. I was wondering if I should put it there because I see the ParNew was 13M after the regionserver started. so 6M is really small. if I set it to 6M, will that cause problem ? I noticed that when the regionserver crashed, the ParNew is 18M. compared to the beginning 13M, it hasn't really grown that much. Jimmy -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 1:54 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase One bit of advice that is easy to overlook but is important is thus: This can be due to the size of the ParNew, which should be relatively small. If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew. Add this to HBASE_OPTS: export HBASE_OPTS=-XX:NewSize=6m -XX:MaxNewSize=6m cms options from above gc logging options from above From our wiki about performance tuning. On Fri, Jul 16, 2010 at 1:50 PM, Jinsong Hu jinsong...@hotmail.com wrote: I was doing stress testing, so the load is not small. But I purposely limited the data rate on client side so load is not big either. using iostat -x 5 and I can see there are lots of situations that the CPU goes to very high level and stay there for long time. but then it ultimately go down. It is highly possible that during a certain period the CPU was too busy and the GC process was starved for CPU. I researched this failure and found an excellent thread talking about GC: http://forums.sun.com/thread.jspa?threadID=698490 that is more detailed than http://wiki.apache.org/hadoop/PerformanceTuning . I will do some tunning and see if it helps following the posted config there. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 1:35 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase These 2 lines are different GC collections: 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs] 5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] It's a little hard to read that, it looks like the CMS concurrent mark took 310 seconds then failed, then we got a 281 second real time pause, but interestingly enough the user and system time is fairly low. How loaded are these machines? You need to be giving enough uncontended CPU time to hbase. On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com wrote: I already implemented all these configs before my test. I checked gc-hbase.log, I see a GC failure which looks very suspecious: 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs] 2016283K -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08 sys=0.00, r eal=0.83 secs] 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [ Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K), 0.05792 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs] the concurrent mode failure, and 281.05 seconds of GC time, looks like the culprit for the problem. I just wonder how to resolve this issue. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 12:57 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase Sometimes the GC can chain multiple medium pauses into one large pause. I've seen this before where there are 2 long pauses back to back and the result was a 50second+ pause. This article talks a lot about GC performance and tuning, check it out: http://wiki.apache.org/hadoop/PerformanceTuning -ryan On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, the root cause seems to be the gap of 4 minutes between 2010-07-16 05:49:26,805 and 2010-07-16 05:53:23,476 . but I checked GC gc-hbase.log and don't see 4 minute gap in gc. I just wonder what could cause this large gap. I also wonder if there is a configuration that I can do to avoid this long pause, or get around the problem cause by this long pause. Jimmy -- From: Stack st...@duboce.net Sent: Friday, July 16, 2010 11:44 AM To: user@hbase.apache.org Subject: Re:
Re: YouAreDeadException with hbase
Are you swapping? The article you cite is from 2006, 5 years ago. As is noted at the end of that thread, the JVMs change over time; many of the cited configs. may no longer exist or function. St.Ack On Fri, Jul 16, 2010 at 1:50 PM, Jinsong Hu jinsong...@hotmail.com wrote: I was doing stress testing, so the load is not small. But I purposely limited the data rate on client side so load is not big either. using iostat -x 5 and I can see there are lots of situations that the CPU goes to very high level and stay there for long time. but then it ultimately go down. It is highly possible that during a certain period the CPU was too busy and the GC process was starved for CPU. I researched this failure and found an excellent thread talking about GC: http://forums.sun.com/thread.jspa?threadID=698490 that is more detailed than http://wiki.apache.org/hadoop/PerformanceTuning . I will do some tunning and see if it helps following the posted config there. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 1:35 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase These 2 lines are different GC collections: 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs] 5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] It's a little hard to read that, it looks like the CMS concurrent mark took 310 seconds then failed, then we got a 281 second real time pause, but interestingly enough the user and system time is fairly low. How loaded are these machines? You need to be giving enough uncontended CPU time to hbase. On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com wrote: I already implemented all these configs before my test. I checked gc-hbase.log, I see a GC failure which looks very suspecious: 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs] 2016283K -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08 sys=0.00, r eal=0.83 secs] 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs] [ Times: user=95.87 sys=3.06, real=310.97 secs] (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs] 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K), 0.05792 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs] the concurrent mode failure, and 281.05 seconds of GC time, looks like the culprit for the problem. I just wonder how to resolve this issue. Jimmy. -- From: Ryan Rawson ryano...@gmail.com Sent: Friday, July 16, 2010 12:57 PM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase Sometimes the GC can chain multiple medium pauses into one large pause. I've seen this before where there are 2 long pauses back to back and the result was a 50second+ pause. This article talks a lot about GC performance and tuning, check it out: http://wiki.apache.org/hadoop/PerformanceTuning -ryan On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com wrote: Yes, the root cause seems to be the gap of 4 minutes between 2010-07-16 05:49:26,805 and 2010-07-16 05:53:23,476 . but I checked GC gc-hbase.log and don't see 4 minute gap in gc. I just wonder what could cause this large gap. I also wonder if there is a configuration that I can do to avoid this long pause, or get around the problem cause by this long pause. Jimmy -- From: Stack st...@duboce.net Sent: Friday, July 16, 2010 11:44 AM To: user@hbase.apache.org Subject: Re: YouAreDeadException with hbase You'll see this if the server reports to the master after the master has ruled it 'dead'. Here is the code that produces the exception: if (!isDead(serverName)) return; String message = Server + what + rejected; currently processing + serverName + as dead server; LOG.debug(message); throw new YouAreDeadException(message); Servers are on the 'dead' list if zk reports their session has expired. The master moves then to cleanup after the dead server and process its logs. If during this cleanup time the server reports in, master will return the youaredead exception. Usually the RS has lost its zk session but has yet to realize it. St.Ack On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu
Re: Row level locking?
On Fri, Jul 16, 2010 at 2:01 PM, Guilherme Germoglio germog...@gmail.com wrote: Just another question -- slightly related to locks. Will HBase 0.90 include HTable.checkAndPut receiving more than one value to check? I'm eager to help, if possible. I don't think there is even an issue to add that facility Guilherme. Make one, stick up a patch and we'll add it. Good on you, St.Ack
Re: Row level locking?
In the uncontended case this is fine, although you are doing 4 RPCs to accomplish what could be done in 1 (with CAS). But in the contended case, all the people waiting on that lock consume RPC handler threads eventually causing a temporary deadlock since the original lockholder will not be able to progress to release the lock. The 60 second release will kick in and things might flow again for a bit. On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohen justin.co...@teamaol.com wrote: What kind of trouble? I do quite a bit of: l = lock(row); val = get(row); /* modify val */ put(row, val); unlock(l); Is there an alternative? -justin On 7/16/10 4:02 PM, Ryan Rawson wrote: Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone.
Re: Row level locking?
In that case it would be 2 RPC, right? do { get, update, checkAndPut } while (ret = false)? Plus 2 for each contention? Thanks, -justin On 7/16/10 5:09 PM, Ryan Rawson wrote: In the uncontended case this is fine, although you are doing 4 RPCs to accomplish what could be done in 1 (with CAS). But in the contended case, all the people waiting on that lock consume RPC handler threads eventually causing a temporary deadlock since the original lockholder will not be able to progress to release the lock. The 60 second release will kick in and things might flow again for a bit. On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohenjustin.co...@teamaol.com wrote: What kind of trouble? I do quite a bit of: l = lock(row); val = get(row); /* modify val */ put(row, val); unlock(l); Is there an alternative? -justin On 7/16/10 4:02 PM, Ryan Rawson wrote: Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone.
Re: Hanging regionservers
Using Ryan Rawson's suggested config tweaks, we have just completed a successful job run with a 15GB sequence file, no hang. I'm setting up to have multiple files process this weekend with the new settings. :-) I believe the dfs socket write timeout being indefinite was the trick. I'll post my results on Monday. Thanks for the support thus far! -Luke On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote: I'm not seeing anything in that logfile, you are seeing compactions for various regions, but im not seeing flushes (typical during insert loads) and nothing else. One thing we look to see is a log message Blocking updates which indicates that a particular region has decided it's holding up to prevent taking too many inserts. Like I said, you could be seeing this on a different regionserver, if all the clients are blocked on 1 regionserver and can't get to the others then most will look idle and only one will actually show anything interesting in the log. Can you check for this behaviour? Also if you want to tweak the config with the values I pasted that should help. On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: It looks like we are going straight from the default config, no expicit setting of anything. On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote: In this case the regionserver isn't actually doing anything - all the IPC thread handlers are waiting in their queue handoff thingy (how they get socket/work to do). Something elsewhere perhaps? Check the logs of your jobs, there might be something interesting there. One thing that frequently happens is you overrun 1 regionserver with edits and it isnt flushing fast enough, so it pauses updates and all clients end up stuck on it. What was that config again? I use these settings: property namehbase.hstore.blockingStoreFiles/name value15/value /property property namedfs.datanode.socket.write.timeout/name value0/value /property property namehbase.hregion.memstore.block.multiplier/name value8/value /property perhaps try these ones? -ryan
Stargate returns single value cell query result in encode format
I am investigation Stargate to provide REST access and ran into an interesting quesiton. I have a simple test table and when I scan it within the hbase shell, I got the following result. hbase(main):028:0 scan 'test' ROW COLUMN+CELL row1column=data:1, timestamp=1278703763205, value=value1 row1column=data:a, timestamp=1278969106700, value=a row2column=data:2, timestamp=1278703772510, value=value2 row2column=data:3, timestamp=1278703778936, value=value3 row3column=data:1, timestamp=1279318319309, value=row3 3 row(s) in 0.0610 seconds Issuing http://localhost:58080/test/row* returns CellSet Row key=cm93MQ== Cell timestamp=1278703763205 column=ZGF0YToxdmFsdWUx/Cell Cell timestamp=1278969106700 column=ZGF0YTphYQ==/Cell /Row Row key=cm93Mg== Cell timestamp=1278703772510 column=ZGF0YToydmFsdWUy/Cell Cell timestamp=1278703778936 column=ZGF0YTozdmFsdWUz/Cell /Row Row key=cm93Mw== Cell timestamp=1279318319309 column=ZGF0YToxcm93Mw==/Cell /Row /CellSet Why are the row key, column and value data encoded? How can they be return as text? Thanks in advance.
Re: Hanging regionservers
So, it seems like you are by-passing issue by having no time out on the socket. Would be for sure interested though if you have the issue still on cdh3b2. Most folks will not be running with no socket timeout. Thanks Luke. St.Ack On Fri, Jul 16, 2010 at 3:01 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: Using Ryan Rawson's suggested config tweaks, we have just completed a successful job run with a 15GB sequence file, no hang. I'm setting up to have multiple files process this weekend with the new settings. :-) I believe the dfs socket write timeout being indefinite was the trick. I'll post my results on Monday. Thanks for the support thus far! -Luke On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote: I'm not seeing anything in that logfile, you are seeing compactions for various regions, but im not seeing flushes (typical during insert loads) and nothing else. One thing we look to see is a log message Blocking updates which indicates that a particular region has decided it's holding up to prevent taking too many inserts. Like I said, you could be seeing this on a different regionserver, if all the clients are blocked on 1 regionserver and can't get to the others then most will look idle and only one will actually show anything interesting in the log. Can you check for this behaviour? Also if you want to tweak the config with the values I pasted that should help. On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: It looks like we are going straight from the default config, no expicit setting of anything. On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote: In this case the regionserver isn't actually doing anything - all the IPC thread handlers are waiting in their queue handoff thingy (how they get socket/work to do). Something elsewhere perhaps? Check the logs of your jobs, there might be something interesting there. One thing that frequently happens is you overrun 1 regionserver with edits and it isnt flushing fast enough, so it pauses updates and all clients end up stuck on it. What was that config again? I use these settings: property namehbase.hstore.blockingStoreFiles/name value15/value /property property namedfs.datanode.socket.write.timeout/name value0/value /property property namehbase.hregion.memstore.block.multiplier/name value8/value /property perhaps try these ones? -ryan
Re: Hanging regionservers
According to Todd, there is some kind of weird Thread coordination issue which is worked around by setting the timeout to 0, even though we actually arent hitting any timeouts in the failure case. And it might have been fixed in cdh3. I haven't had chance to run it yet so I can't say. -ryan On Fri, Jul 16, 2010 at 3:32 PM, Stack st...@duboce.net wrote: So, it seems like you are by-passing issue by having no time out on the socket. Would be for sure interested though if you have the issue still on cdh3b2. Most folks will not be running with no socket timeout. Thanks Luke. St.Ack On Fri, Jul 16, 2010 at 3:01 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: Using Ryan Rawson's suggested config tweaks, we have just completed a successful job run with a 15GB sequence file, no hang. I'm setting up to have multiple files process this weekend with the new settings. :-) I believe the dfs socket write timeout being indefinite was the trick. I'll post my results on Monday. Thanks for the support thus far! -Luke On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote: I'm not seeing anything in that logfile, you are seeing compactions for various regions, but im not seeing flushes (typical during insert loads) and nothing else. One thing we look to see is a log message Blocking updates which indicates that a particular region has decided it's holding up to prevent taking too many inserts. Like I said, you could be seeing this on a different regionserver, if all the clients are blocked on 1 regionserver and can't get to the others then most will look idle and only one will actually show anything interesting in the log. Can you check for this behaviour? Also if you want to tweak the config with the values I pasted that should help. On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand luke.foreh...@networkedinsights.com wrote: It looks like we are going straight from the default config, no expicit setting of anything. On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote: In this case the regionserver isn't actually doing anything - all the IPC thread handlers are waiting in their queue handoff thingy (how they get socket/work to do). Something elsewhere perhaps? Check the logs of your jobs, there might be something interesting there. One thing that frequently happens is you overrun 1 regionserver with edits and it isnt flushing fast enough, so it pauses updates and all clients end up stuck on it. What was that config again? I use these settings: property namehbase.hstore.blockingStoreFiles/name value15/value /property property namedfs.datanode.socket.write.timeout/name value0/value /property property namehbase.hregion.memstore.block.multiplier/name value8/value /property perhaps try these ones? -ryan
Re: Row level locking?
Fine grain locking is not a good use case for ZooKeeper given it's quorum based architecture. Patrick On 07/16/2010 01:24 PM, Ryan Rawson wrote: Explicit locks with zookeeper would be (a) slow and (b) completely out of band and ultimately up to you. I wouldn't exactly be eager to do our row locking in zookeeper (since the minimum operation time is between 2-10ms). You could do application advisory locks, but that is true no matter what datastore you use... On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio germog...@gmail.com wrote: What about implementing explicit row locks using the zookeeper? I'm planning to do this sometime in the near future. Does anyone have any comments against this approach? (or maybe it was already implemented by someone :-) On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawsonryano...@gmail.com wrote: HTable.close does very little: public void close() throws IOException{ flushCommits(); } None of which involves row locks. One thing to watch out for is to remember to close your scanners - they continue to use server-side resources until you close them or 60 seconds passes and they get timed out. Also be very wary of using any of the explicit row locking calls, they are generally trouble for more or less everyone. There was a proposal to remove them, but I don't think that went through. On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehenecleh...@adobe.com wrote: On Jul 16, 2010, at 6:41 PM, Michael Segel wrote: Thanks for the response. (You don't need to include the cc ...) With respect to the row level locking ... I was interested in when the lock is actually acquired, how long the lock persists and when is the lock released. From your response, the lock is only held on updating the row, and while the data is being written to the memory cache which is then written to disk. (Note: This row level locking different than transactional row level locking.) Now that I've had some caffeine I think I can clarify... :-) Some of my developers complained that they were having trouble with two different processes trying to update the same table. Not sure why they were having the problem, so I wanted to have a good fix. The simple fix was to have them issue the close() the HTable connection which forces any resources that they acquired to be released. It would help to know what the exact problem was. Normally I wouldn't see any problems. In looking at the problem... its possible that they didn't have AutoFlush set to true so the write was still in the buffer and hadn't gotten flushed. If the lock only persists for the duration of the write to memory and is then released, then the issue could have been that the record written was in the buffer and not yet flushed to disk. At the region server level HBase will use the cache for both reads and writes. This happens transparently for the user. Once something is written in the cache, all other clients will read from the same cache. No need to worry if the cache has been flushed. Lars George has a good article about the hbase storage architecture http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html I'm also assuming that when you run a scan() against a region that any information written to buffer but not yet written to disk will be missed. When you do puts into hbase you'll use HTable. The HTable instance is on the client. HTable keeps a buffer as well and if autoFlush is false it only flushes when you do flushCommits() or when it reaches the buffer limit, or when you close the table. With autoFlush set to true it will flush for every put. This buffer is on the client. So when data is actually flushed it gets on the region server where it will get in the region server cache and WAL. Unless a client flushes the put no other client can see the data because it still resides on the client only. Depending on what you need to do you can use autoFlush true if you are doing many small writes that need to be seen immediately by others. You can use autoFlush false and issue flushCommits() yourself, or you can rely on the buffer limit for that. So I guess the question isn't so much the issue of a lock, but that we need to make sure that data written to the buffer should be flushed ASAP unless we know that we're going to be writing a lot of data in the m/r job. Usually when you write from the reducer (heavy) is better to use a buffer and not autoFlush to have a good performance. Cosmin Thx -Mike From: cleh...@adobe.commailto:cleh...@adobe.com To: user@hbase.apache.orgmailto:user@hbase.apache.org CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org Date: Fri, 16 Jul 2010 12:34:36 +0100 Subject: Re: Row level locking? Currently a row is part of a region and there's a single region server serving that region at a particular moment. So when that row is updated a lock is acquired for that row until the actual data is updated in memory (note that a put