Re: transactional hbase

2010-07-16 Thread Cosmin Lehene
Hi, 

HBase supports row-level atomic operations. If you need to update more than a 
row in a single transaction you'll have to use a coordination mechanism. You 
can use Zookeeper for that. 

However you can also evaluate whether you can't use a single table for your 
operation by taking advantage of the columnar model. Basically you can encode 
an entire star schema in a single table in HBase. Each family can act as a 
table. So when you have  1-M relations you can just write all the values (M) in 
a single row (1). This way you could use a single put operation to update your 
child entities.

Cosmin 

On Jul 16, 2010, at 12:57 AM, N Kapshoo wrote:

 Can someone please point me to some more info? I am concerned about
 inconsistent data and am evaluating risk...
 Thanks.
 
 On Thu, Jul 15, 2010 at 3:11 PM, N Kapshoo nkaps...@gmail.com wrote:
 Hi,
 
 I have a write API call that does 3 puts (2 puts  to one table and a
 3rd put to a second table). How do I go about making sure that these
 all happen or none happen at all?  In short, an atomic transaction.
 
 I read up a little bit about the TransactionManager and that I need to
 modify hbase-site.xml to make these TransactionalRegion Servers. Can
 someone point me to some more info about the same? What versions, what
 performance impacts etc? Are there some good urls that anyone can
 share?
 
 I am using HBase 0.20.3 at this time. I dont believe transactions are
 supported in this version. If I were to go live and then decide to use
 transactions later, how should I plan on an upgrade?
 
 Thanks.
 



Row level locking?

2010-07-16 Thread Michael Segel

Ok,

First, I'm writing this before I've had my first cup of coffee so I am 
apologizing in advance if the question is a brain dead question

Going from a relational background, some of these questions may not make sense 
in the HBase world.


When does HBase acquire a lock on a row and how long does it persist? Does the 
lock only hit the current row, or does it also lock the adjacent rows too?
Does HBase support the concept of 'dirty reads'? 

The issue is what happens when you have two jobs trying to hit the same table 
at the same time and update/read the rows at the same time.

A developer came across a problem and the fix was to use the HTable.close() 
method to release any resources.

I am wondering if you explicitly have to clean up or can a lazy developer let 
the object just go out of scope and get GC'd.

Thx

-Mike

  
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Re: Row level locking?

2010-07-16 Thread Cosmin Lehene
Currently a row is part of a region and there's a single region server serving 
that region at a particular moment. 
So when that row is updated a lock is acquired for that row until the actual 
data is updated in memory (note that a put will be written to cache on the 
region server and also persisted in the write-ahead log - WAL). Subsequent puts 
to that row will have to wait for that lock.

HBase is fully consistent. This being said all the locking takes place at row 
level only, so when you scan you have to take that into account as there's no 
range locking. 

I'm not sure I understand the resource releasing issue. HTable.close() flushes 
the current write buffer (you can have write buffer if you use autoFlush set to 
false). 

Cosmin


On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:

 
 Ok,
 
 First, I'm writing this before I've had my first cup of coffee so I am 
 apologizing in advance if the question is a brain dead question
 
 Going from a relational background, some of these questions may not make 
 sense in the HBase world.
 
 
 When does HBase acquire a lock on a row and how long does it persist? Does 
 the lock only hit the current row, or does it also lock the adjacent rows too?
 Does HBase support the concept of 'dirty reads'? 
 
 The issue is what happens when you have two jobs trying to hit the same table 
 at the same time and update/read the rows at the same time.
 
 A developer came across a problem and the fix was to use the HTable.close() 
 method to release any resources.
 
 I am wondering if you explicitly have to clean up or can a lazy developer let 
 the object just go out of scope and get GC'd.
 
 Thx
 
 -Mike
 
 
 _
 The New Busy is not the too busy. Combine all your e-mail accounts with 
 Hotmail.
 http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4



RE: Client Connection to Zookeeper times out during mapreduce

2010-07-16 Thread Hegner, Travis
Thanks St.Ack!

I must have lost that config after my last hadoop upgrade. Much appreciated!

Travis

-Original Message-
From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
Sent: Thursday, July 15, 2010 11:42 AM
To: user@hbase.apache.org
Subject: Re: Client Connection to Zookeeper times out during mapreduce

Is this intentional?

Attempting connection to server localhost/127.0.0.1:2181

I'd guess not.  Would seem to indicate then that your conf dir is not
on the CLASSPATH when the child task runs?  You've seen
http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?

St.Ack

On Thu, Jul 15, 2010 at 7:16 AM, Hegner, Travis theg...@trilliumit.com wrote:
 Afterthought:

 hadoop: 0.20.2, r911707
 hbase: 0.20.5, r950246

 Thanks Again!



 -Original Message-
 From: Hegner, Travis [mailto:theg...@trilliumit.com]
 Sent: Thursday, July 15, 2010 10:15 AM
 To: user@hbase.apache.org
 Subject: Client Connection to Zookeeper times out during mapreduce

 Hi All!

 I apologize in advance for what is probably a rookie mistake, but it's been a 
 while since I've done a map reduce with Hbase as input or output. I'm 
 assuming that this is a cluster config problem because even the built-in 
 rowcounter mapreduce job is failing. The hbase shell is operating normally, 
 and I can even access Hbase from thrift.

 Cluster:
 1 master: NN, JT, ZK, HM
 5 slaves: DN, TT, RS

 When attempting to run a mapreduce job with Hbase as a source, my map tasks 
 are failing with the following logged exceptions:

 2010-07-15 09:16:22,383 INFO org.apache.zookeeper.ZooKeeper: Client 
 environment:user.dir=/tmp/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201007150924_0003/attempt_201007150924_0003_m_00_0/work
 2010-07-15 09:16:22,386 INFO org.apache.zookeeper.ZooKeeper: Initiating 
 client connection, connectString=localhost:2181 sessionTimeout=6 
 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@80fa6f
 2010-07-15 09:16:22,389 INFO org.apache.zookeeper.ClientCnxn: 
 zookeeper.disableAutoWatchReset is false
 2010-07-15 09:16:22,402 INFO org.apache.zookeeper.ClientCnxn: Attempting 
 connection to server localhost/127.0.0.1:2181
 2010-07-15 09:16:22,407 WARN org.apache.zookeeper.ClientCnxn: Exception 
 closing session 0x0 to sun.nio.ch.selectionkeyi...@1d85f79
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
 2010-07-15 09:16:22,416 WARN org.apache.zookeeper.ClientCnxn: Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at 
 sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
at 
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
 2010-07-15 09:16:22,416 WARN org.apache.zookeeper.ClientCnxn: Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at 
 sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
at 
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
 2010-07-15 09:16:22,532 WARN 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase 
 -- check quorum servers, currently=localhost:2181
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase
at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:808)
at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:432)
at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.checkOutOfSafeMode(ZooKeeperWrapper.java:545)
at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:974)
at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
at 
 

Re: Hanging regionservers

2010-07-16 Thread Luke Forehand

I grepped yesterday's logs on all servers for Blocking updates and there was 
no trace.  I believe I had encountered the blocking updates problem earlier in 
the project but throttled down the import speed which seemed to fix that.

I just double checked and all three region servers were idle.  Something 
interesting that I noticed however, was that each regionserver had a particular 
ResponseProcessor thread running for a specific block, and that thread was 
stuck in a running state during the entirety of the hang.  Also a DataStreamer 
thread for the block associated with the ResponseProcessor was in a wait state. 
 This makes me think that each server was stuck operating on a specific block.

ResponseProcessor for block blk_1926230463847049982_2694658 - Thread 
t...@61160
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked sun.nio.ch.uti...@196fbfd0
- locked java.util.collections$unmodifiable...@7799fdbb
- locked sun.nio.ch.epollselectori...@1ee13d55
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399)

   Locked ownable synchronizers:
- None

DataStreamer for file 
/hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023
 block blk_1926230463847049982_2694658 - Thread t...@61158
   java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247)

   Locked ownable synchronizers:
- None

On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote:

I'm not seeing anything in that logfile, you are seeing compactions
for various regions, but im not seeing flushes (typical during insert
loads) and nothing else. One thing we look to see is a log message
Blocking updates which indicates that a particular region has
decided it's holding up to prevent taking too many inserts.

Like I said, you could be seeing this on a different regionserver, if
all the clients are blocked on 1 regionserver and can't get to the
others then most will look idle and only one will actually show
anything interesting in the log.

Can you check for this behaviour?

Also if you want to tweak the config with the values I pasted that should help.

On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand
luke.foreh...@networkedinsights.com wrote:
 It looks like we are going straight from the default config, no expicit 
 setting of anything.

 On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote:

 In this case the regionserver isn't actually doing anything - all the
 IPC thread handlers are waiting in their queue handoff thingy (how
 they get socket/work to do).

 Something elsewhere perhaps?  Check the logs of your jobs, there might
 be something interesting there.

 One thing that frequently happens is you overrun 1 regionserver with
 edits and it isnt flushing fast enough, so it pauses updates and all
 clients end up stuck on it.

 What was that config again?  I use these settings:

 property
  namehbase.hstore.blockingStoreFiles/name
  value15/value
 /property

 property
  namedfs.datanode.socket.write.timeout/name
  value0/value
 /property

 property
  namehbase.hregion.memstore.block.multiplier/name
  value8/value
 /property

 perhaps try these ones?

 -ryan

 On Thu, Jul 15, 2010 at 6:55 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:

 Yea I started with 12 task max per node, dropped it to 8, and now 4, but 
 still having the issue.  Here is the spec for each box:

 2x CPU (2.5 GHZ nehalem ep quad core)
 24 GB RAM (4gb / region server )
 4x 1tb hard drives


 On 7/15/10 6:32 PM, Ryan Rawson ryano...@gmail.com wrote:

 Are you sure you dont have any GC issues?

 Also you are aware that max.tasks.maximum is _per node_ right?  How
 big are these machines?

 -ryan

 On Thu, Jul 15, 2010 at 4:30 PM, Jean-Daniel Cryans jdcry...@apache.org 
 wrote:
 Nothing particular in that dump, and I'm not aware of any deadlock in
 0.20.3, could we see the region server log?

 Thx

 J-D

 On Thu, Jul 15, 2010 at 4:17 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:
 First post evar!

 I have a 3 node 

High ingest rate and FIN_WAIT1 problems

2010-07-16 Thread Thomas Downing

Hi,

I am a complete HBase and HDFS newbie, so I apologize in advance for
the inevitable bloopers.

We are doing feasibility testing on NoSql data store options, with rather
high ingest rate requirements.  So far, HBase is looking good, with only
one issue identified. Running at an ingest rate of ~30K rows per second
on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets.

This is a single node setup - no replication.  The CPU load is only about
50%-60%, with the majority of that in userland, system and iowait are
averaging less than 3%.  There is no swapping going on.

The problem is that on the datanode there are a large number of sockets
in FIN_WAIT1, with corresponding peers on master in ESTABLISHED.
These pairs hang around for quite some time, at at my ingest rate this
means that the total sockets held by datanode and master is slowly going
up.

If my understanding of TCP is correct, then this indicates that the master
peer has stopped reading incoming data from the datanode - i.e, it is
sending a window of zero; and that the datanode has called close(2) on
it's peer.

There was a thread some time ago:

http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html

There was no real conclusion.  I have played with the config params as
suggested on that thread, but no luck yet.  Also, in that case the problem
seemed to be between datanodes for replication operations - not the case
with me.  Changing timeouts to avoid the slow increase might not really
solve the problem IFF the master peer has in fact ceased to read it's
socket.  The data outstanding in the TCP stack buffer would be lost.
Whether that would imply data loss is beyond me.

I am posting this here as although the only logs with errors/exceptions
are the datanode logs, netstat and wireshark seem to indicate that the
problem is on the master side.

The master, namenode, regionserver and zookeeper and logs shows no
warning or errors.  The datanode log shows this, over and over:

2010-07-16 00:33:09,269 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:50010, 
storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_3684861726145519813_22386 
to /127.0.0.1:
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 
remote=/127.0.0.1:54774]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)

at java.lang.Thread.run(Thread.java:619)

2010-07-16 00:33:09,269 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(127.0.0.1:50010, 
storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 
remote=/127.0.0.1:54774]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)

at java.lang.Thread.run(Thread.java:619)

It there is any other info that might help, or any steps you would like
me to take, just let me know.

Thanks

Thomas Downing


Advice for efficiently scanning for modified-since

2010-07-16 Thread Mark Laffoon
We have an existing product sitting on the hbase/hadoop ecosystem. We have
laid our object model on HBase: we have one table with a row per object,
and a separate table with composite index rows. Works great. We can
efficiently find our objects based on their type, relationships, etc. by
scanning the index table. We *never* scan the main table (except when
rebuilding the index).

 

A new requirement just came in: get a list of all objects that have been
modified since timestamp. This has to happen quickly (user time).

 

If we scan the main table with a timestamp restriction, will that be
efficient? Or do we have to introduce a new composite index that has the
last modified timestamp as part of it and scan that? 

 

Thanks,

Mark

 



RE: Row level locking?

2010-07-16 Thread Michael Segel


Thanks for the response.
(You don't need to include the cc ...)

With respect to the row level locking ... 
I was interested in when the lock is actually acquired, how long the lock 
persists and when is the lock released.
From your response, the lock is only held on updating the row, and while the 
data is being written to the memory cache which is then written to disk. 
(Note: This row level locking different than transactional row level locking.)

Now that I've had some caffeine I think I can clarify... :-)

Some of my developers complained that they were having trouble with two 
different processes trying to update the same table.
Not sure why they were having the problem, so I wanted to have a good fix. The 
simple fix was to have them issue the close() the HTable connection which 
forces any resources that they acquired to be released. 

In looking at the problem... its possible that they didn't have AutoFlush set 
to true so the write was still in the buffer and hadn't gotten flushed. 

If the lock only persists for the duration of the write to memory and is then 
released, then the issue could have been that the record written was in the 
buffer and not yet flushed to disk.

I'm also assuming that when you run a scan() against a region that any 
information written to buffer but not yet written to disk will be missed.

So I guess the question isn't so much the issue of a lock, but that we need to 
make sure that data written to the buffer should be flushed ASAP unless we know 
that we're going to be writing a lot of data in the m/r job.

Thx

-Mike



 From: cleh...@adobe.com
 To: user@hbase.apache.org
 CC: hbase-u...@hadoop.apache.org
 Date: Fri, 16 Jul 2010 12:34:36 +0100
 Subject: Re: Row level locking?
 
 Currently a row is part of a region and there's a single region server 
 serving that region at a particular moment. 
 So when that row is updated a lock is acquired for that row until the actual 
 data is updated in memory (note that a put will be written to cache on the 
 region server and also persisted in the write-ahead log - WAL). Subsequent 
 puts to that row will have to wait for that lock.
 
 HBase is fully consistent. This being said all the locking takes place at row 
 level only, so when you scan you have to take that into account as there's no 
 range locking. 
 
 I'm not sure I understand the resource releasing issue. HTable.close() 
 flushes the current write buffer (you can have write buffer if you use 
 autoFlush set to false). 
 
 Cosmin
 
 
 On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:
 
  
  Ok,
  
  First, I'm writing this before I've had my first cup of coffee so I am 
  apologizing in advance if the question is a brain dead question
  
  Going from a relational background, some of these questions may not make 
  sense in the HBase world.
  
  
  When does HBase acquire a lock on a row and how long does it persist? Does 
  the lock only hit the current row, or does it also lock the adjacent rows 
  too?
  Does HBase support the concept of 'dirty reads'? 
  
  The issue is what happens when you have two jobs trying to hit the same 
  table at the same time and update/read the rows at the same time.
  
  A developer came across a problem and the fix was to use the HTable.close() 
  method to release any resources.
  
  I am wondering if you explicitly have to clean up or can a lazy developer 
  let the object just go out of scope and get GC'd.
  
  Thx
  
  -Mike
  

  _
  The New Busy is not the too busy. Combine all your e-mail accounts with 
  Hotmail.
  http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
 
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

RE: High ingest rate and FIN_WAIT1 problems

2010-07-16 Thread Hegner, Travis
Hi Thomas,

I ran into a very similar issue when running slony-I on postgresql to replicate 
15-20 databases.

Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or 
hopefully stop), the leaking sockets. I found some notes about adjusting TCP 
parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html

with the specific excerpt regarding the TCP_FIN_TIMEOUT:

TCP_FIN_TIMEOUT
This setting determines the time that must elapse before TCP/IP can release a 
closed connection and reuse its resources. During this TIME_WAIT state, 
reopening the connection to the client costs less than establishing a new 
connection. By reducing the value of this entry, TCP/IP can release closed 
connections faster, making more resources available for new connections. 
Addjust this in the presense of many connections sitting in the TIME_WAIT state:


# echo 30  /proc/sys/net/ipv4/tcp_fin_timeout

Try setting this lower on your master. You may also consider these on the same 
link:

TCP_TW_RECYCLE
It enables fast recycling of TIME_WAIT sockets. The default value is 0 
(disabled). The sysctl documentation incorrectly states the default as enabled. 
It can be changed to 1 (enabled) in many cases. Known to cause some issues with 
hoststated (load balancing and fail over) if enabled, should be used with 
caution.


echo 1  /proc/sys/net/ipv4/tcp_tw_recycle

TCP_TW_REUSE
This allows reusing sockets in TIME_WAIT state for new connections when it is 
safe from protocol viewpoint. Default value is 0 (disabled). It is generally a 
safer alternative to tcp_tw_recycle


echo 1  /proc/sys/net/ipv4/tcp_tw_reuse


The above commands will not persist reboots, but the link explains how. The 
list experts may be able to give more insight on which, if any, of these 
settings are safe to manipulate, and what risks or issues you may encounter 
specifically with Hbase while adjusting these settings.

Hope This Helps,

Travis Hegner


-Original Message-
From: Thomas Downing [mailto:tdown...@proteus-technologies.com]
Sent: Friday, July 16, 2010 10:33 AM
To: user@hbase.apache.org
Subject: High ingest rate and FIN_WAIT1 problems

Hi,

I am a complete HBase and HDFS newbie, so I apologize in advance for
the inevitable bloopers.

We are doing feasibility testing on NoSql data store options, with rather
high ingest rate requirements.  So far, HBase is looking good, with only
one issue identified. Running at an ingest rate of ~30K rows per second
on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets.

This is a single node setup - no replication.  The CPU load is only about
50%-60%, with the majority of that in userland, system and iowait are
averaging less than 3%.  There is no swapping going on.

The problem is that on the datanode there are a large number of sockets
in FIN_WAIT1, with corresponding peers on master in ESTABLISHED.
These pairs hang around for quite some time, at at my ingest rate this
means that the total sockets held by datanode and master is slowly going
up.

If my understanding of TCP is correct, then this indicates that the master
peer has stopped reading incoming data from the datanode - i.e, it is
sending a window of zero; and that the datanode has called close(2) on
it's peer.

There was a thread some time ago:

http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html

There was no real conclusion.  I have played with the config params as
suggested on that thread, but no luck yet.  Also, in that case the problem
seemed to be between datanodes for replication operations - not the case
with me.  Changing timeouts to avoid the slow increase might not really
solve the problem IFF the master peer has in fact ceased to read it's
socket.  The data outstanding in the TCP stack buffer would be lost.
Whether that would imply data loss is beyond me.

I am posting this here as although the only logs with errors/exceptions
are the datanode logs, netstat and wireshark seem to indicate that the
problem is on the master side.

The master, namenode, regionserver and zookeeper and logs shows no
warning or errors.  The datanode log shows this, over and over:

2010-07-16 00:33:09,269 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075,
ipcPort=50020):Got exception while serving blk_3684861726145519813_22386
to /127.0.0.1:
java.net.SocketTimeoutException: 48 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010
remote=/127.0.0.1:54774]
 at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
 at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
 at

Re: High ingest rate and FIN_WAIT1 problems

2010-07-16 Thread Thomas Downing

Thanks for the response.

My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2,
my problem is with FIN_WAIT1.

While I do see some sockets in TIME_WAIT, they are only a few, and the
number is not growing.

On 7/16/2010 12:07 PM, Hegner, Travis wrote:

Hi Thomas,

I ran into a very similar issue when running slony-I on postgresql to replicate 
15-20 databases.

Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow (or 
hopefully stop), the leaking sockets. I found some notes about adjusting TCP 
parameters here: http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html

with the specific excerpt regarding the TCP_FIN_TIMEOUT:

TCP_FIN_TIMEOUT
This setting determines the time that must elapse before TCP/IP can release a 
closed connection and reuse its resources. During this TIME_WAIT state, 
reopening the connection to the client costs less than establishing a new 
connection. By reducing the value of this entry, TCP/IP can release closed 
connections faster, making more resources available for new connections. 
Addjust this in the presense of many connections sitting in the TIME_WAIT state:


# echo 30  /proc/sys/net/ipv4/tcp_fin_timeout

Try setting this lower on your master. You may also consider these on the same 
link:

TCP_TW_RECYCLE
It enables fast recycling of TIME_WAIT sockets. The default value is 0 
(disabled). The sysctl documentation incorrectly states the default as enabled. 
It can be changed to 1 (enabled) in many cases. Known to cause some issues with 
hoststated (load balancing and fail over) if enabled, should be used with 
caution.


echo 1  /proc/sys/net/ipv4/tcp_tw_recycle

TCP_TW_REUSE
This allows reusing sockets in TIME_WAIT state for new connections when it is 
safe from protocol viewpoint. Default value is 0 (disabled). It is generally a 
safer alternative to tcp_tw_recycle


echo 1  /proc/sys/net/ipv4/tcp_tw_reuse


The above commands will not persist reboots, but the link explains how. The 
list experts may be able to give more insight on which, if any, of these 
settings are safe to manipulate, and what risks or issues you may encounter 
specifically with Hbase while adjusting these settings.

Hope This Helps,

Travis Hegner


-Original Message-
From: Thomas Downing [mailto:tdown...@proteus-technologies.com]
Sent: Friday, July 16, 2010 10:33 AM
To: user@hbase.apache.org
Subject: High ingest rate and FIN_WAIT1 problems

Hi,

I am a complete HBase and HDFS newbie, so I apologize in advance for
the inevitable bloopers.

We are doing feasibility testing on NoSql data store options, with rather
high ingest rate requirements.  So far, HBase is looking good, with only
one issue identified. Running at an ingest rate of ~30K rows per second
on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets.

This is a single node setup - no replication.  The CPU load is only about
50%-60%, with the majority of that in userland, system and iowait are
averaging less than 3%.  There is no swapping going on.

The problem is that on the datanode there are a large number of sockets
in FIN_WAIT1, with corresponding peers on master in ESTABLISHED.
These pairs hang around for quite some time, at at my ingest rate this
means that the total sockets held by datanode and master is slowly going
up.

If my understanding of TCP is correct, then this indicates that the master
peer has stopped reading incoming data from the datanode - i.e, it is
sending a window of zero; and that the datanode has called close(2) on
it's peer.

There was a thread some time ago:

http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html

There was no real conclusion.  I have played with the config params as
suggested on that thread, but no luck yet.  Also, in that case the problem
seemed to be between datanodes for replication operations - not the case
with me.  Changing timeouts to avoid the slow increase might not really
solve the problem IFF the master peer has in fact ceased to read it's
socket.  The data outstanding in the TCP stack buffer would be lost.
Whether that would imply data loss is beyond me.

I am posting this here as although the only logs with errors/exceptions
are the datanode logs, netstat and wireshark seem to indicate that the
problem is on the master side.

The master, namenode, regionserver and zookeeper and logs shows no
warning or errors.  The datanode log shows this, over and over:

2010-07-16 00:33:09,269 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1028643313-10.1.1.200-50010-1279026099917, infoPort=50075,
ipcPort=50020):Got exception while serving blk_3684861726145519813_22386
to /127.0.0.1:
java.net.SocketTimeoutException: 48 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010
remote=/127.0.0.1:54774]
  at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
 

Re: Advice for efficiently scanning for modified-since

2010-07-16 Thread Stack
This request has come up a few times now.  We should dev a soln (I've
made an issue to track it -- HBASE-2839).

You could try scanning whole table but my guess is this will prove too
slow -- if not now, then later when your table grows.

If results returned will be few, a scan that ran in parallel rather
than in series tripping over each table region might make sense.
https://issues.apache.org/jira/browse/HBASE-1935 discusses this and
even has a patch though I'm sure it well stale at this point.

St.Ack




On Fri, Jul 16, 2010 at 8:16 AM, Mark Laffoon
mlaff...@semanticresearch.com wrote:
 We have an existing product sitting on the hbase/hadoop ecosystem. We have
 laid our object model on HBase: we have one table with a row per object,
 and a separate table with composite index rows. Works great. We can
 efficiently find our objects based on their type, relationships, etc. by
 scanning the index table. We *never* scan the main table (except when
 rebuilding the index).



 A new requirement just came in: get a list of all objects that have been
 modified since timestamp. This has to happen quickly (user time).



 If we scan the main table with a timestamp restriction, will that be
 efficient? Or do we have to introduce a new composite index that has the
 last modified timestamp as part of it and scan that?



 Thanks,

 Mark






Re: Hanging regionservers

2010-07-16 Thread Stack
Would you mind pastebinning the stacktrace?  It doesn't looks like
https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the
below, an issue that HADOOP-5859 purportedly fixes -- I see you
commented on it -- but our Todd thinks otherwise (He has a 'real' fix
up in another issue that I currently can't put my finger on).
St.Ack

On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand
luke.foreh...@networkedinsights.com wrote:

 I grepped yesterday's logs on all servers for Blocking updates and there 
 was no trace.  I believe I had encountered the blocking updates problem 
 earlier in the project but throttled down the import speed which seemed to 
 fix that.

 I just double checked and all three region servers were idle.  Something 
 interesting that I noticed however, was that each regionserver had a 
 particular ResponseProcessor thread running for a specific block, and that 
 thread was stuck in a running state during the entirety of the hang.  Also a 
 DataStreamer thread for the block associated with the ResponseProcessor was 
 in a wait state.  This makes me think that each server was stuck operating on 
 a specific block.

 ResponseProcessor for block blk_1926230463847049982_2694658 - Thread 
 t...@61160
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked sun.nio.ch.uti...@196fbfd0
    - locked java.util.collections$unmodifiable...@7799fdbb
    - locked sun.nio.ch.epollselectori...@1ee13d55
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readLong(DataInputStream.java:399)
    at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399)

   Locked ownable synchronizers:
    - None

 DataStreamer for file 
 /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023
  block blk_1926230463847049982_2694658 - Thread t...@61158
   java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c
    at java.lang.Object.wait(Native Method)
    at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247)

   Locked ownable synchronizers:
    - None

 On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote:

 I'm not seeing anything in that logfile, you are seeing compactions
 for various regions, but im not seeing flushes (typical during insert
 loads) and nothing else. One thing we look to see is a log message
 Blocking updates which indicates that a particular region has
 decided it's holding up to prevent taking too many inserts.

 Like I said, you could be seeing this on a different regionserver, if
 all the clients are blocked on 1 regionserver and can't get to the
 others then most will look idle and only one will actually show
 anything interesting in the log.

 Can you check for this behaviour?

 Also if you want to tweak the config with the values I pasted that should 
 help.

 On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:
 It looks like we are going straight from the default config, no expicit 
 setting of anything.

 On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote:

 In this case the regionserver isn't actually doing anything - all the
 IPC thread handlers are waiting in their queue handoff thingy (how
 they get socket/work to do).

 Something elsewhere perhaps?  Check the logs of your jobs, there might
 be something interesting there.

 One thing that frequently happens is you overrun 1 regionserver with
 edits and it isnt flushing fast enough, so it pauses updates and all
 clients end up stuck on it.

 What was that config again?  I use these settings:

 property
  namehbase.hstore.blockingStoreFiles/name
  value15/value
 /property

 property
  namedfs.datanode.socket.write.timeout/name
  value0/value
 /property

 property
  namehbase.hregion.memstore.block.multiplier/name
  value8/value
 /property

 perhaps try these ones?

 -ryan

 On Thu, Jul 15, 2010 at 6:55 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:

 Yea I started with 12 task max per node, dropped it to 8, and now 4, but 
 still having the issue.  Here is the spec for each box:

 2x CPU (2.5 GHZ nehalem ep quad core)
 24 GB RAM (4gb / region server )
 4x 1tb hard drives


 On 7/15/10 6:32 PM, Ryan Rawson ryano...@gmail.com wrote:

 Are you sure you 

Re: Hanging regionservers

2010-07-16 Thread Stack
Each time you threaddump, its stuck in same way?

I've not seen this dfsclient hangup before, not that I remember.  Let
me ask some hdfs-heads.  Will be back to you.

Any chance of your upping to CHD3b2, for your hadoop at least?  HDFS
has a few dfsclient/ipc fixes -- though looking at them none seem to
explicitly address your issue.

Whats that jvm that you are running?  Can you do a java -version?
Whats your OS?

Thanks,
St.Ack



On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand
luke.foreh...@networkedinsights.com wrote:
 Line 58 and line 79 are the threads that I found suspicious.

 http://pastebin.com/W1E2nCZq

 The other stack traces from the other two region servers look identical to 
 this one.  BTW - I have made the config changes per Ryan Rawson's suggestion 
 (thanks!) and I've processed ~7 GB of the 15 GB without hangup thus far so 
 I'm crossing my fingers.

 -Luke

 On 7/16/10 11:48 AM, Stack st...@duboce.net wrote:

 Would you mind pastebinning the stacktrace?  It doesn't looks like
 https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the
 below, an issue that HADOOP-5859 purportedly fixes -- I see you
 commented on it -- but our Todd thinks otherwise (He has a 'real' fix
 up in another issue that I currently can't put my finger on).
 St.Ack

 On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:

 I grepped yesterday's logs on all servers for Blocking updates and there 
 was no trace.  I believe I had encountered the blocking updates problem 
 earlier in the project but throttled down the import speed which seemed to 
 fix that.

 I just double checked and all three region servers were idle.  Something 
 interesting that I noticed however, was that each regionserver had a 
 particular ResponseProcessor thread running for a specific block, and that 
 thread was stuck in a running state during the entirety of the hang.  Also a 
 DataStreamer thread for the block associated with the ResponseProcessor was 
 in a wait state.  This makes me think that each server was stuck operating 
 on a specific block.

 ResponseProcessor for block blk_1926230463847049982_2694658 - Thread 
 t...@61160
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked sun.nio.ch.uti...@196fbfd0
    - locked java.util.collections$unmodifiable...@7799fdbb
    - locked sun.nio.ch.epollselectori...@1ee13d55
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.DataInputStream.readFully(DataInputStream.java:178)
    at java.io.DataInputStream.readLong(DataInputStream.java:399)
    at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399)

   Locked ownable synchronizers:
    - None

 DataStreamer for file 
 /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023
  block blk_1926230463847049982_2694658 - Thread t...@61158
   java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c
    at java.lang.Object.wait(Native Method)
    at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247)

   Locked ownable synchronizers:
    - None





YouAreDeadException with hbase

2010-07-16 Thread Jinsong Hu

Hi, There:
 I got some YouAreDeadException  with hbase. what can cause it ? I do 
notice between 5:49 to 5:53 ,
for 4 minutes, there is no log. This doesn't look like GC issue as I checked 
the GC log, the longest GC

is only 9.6 seconds.

Jimmy.


2010-07-16 05:49:26,805 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), 
Max=4
08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, 
Miss=2177880,
Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss 
Ratio=99.95

254278182983%, Evicted/Run=NaN
2010-07-16 05:53:23,476 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), 
Max=4
08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, 
Miss=2177880,
Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss 
Ratio=99.95

250105857849%, Evicted/Run=NaN


2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session 
tim
ed out, have not heard from server in 240540ms for sessionid 
0x329c88039b0006c,

closing socket connection and attempting reconnect
2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
con

nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181
2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection
established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating 
sessio

n
2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnec
t to ZooKeeper service, session 0x329c88039b0006c has expired, closing 
socket co

nnection
2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session 
tim
ed out, have not heard from server in 240520ms for sessionid 
0x129c87a7f98007a,

closing socket connection and attempting reconnect


2010-07-16 05:53:39,090 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer
: Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, 
load

=(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception
org.apache.hadoop.hbase.YouAreDeadException: 
org.apache.hadoop.hbase.YouAreDeadE
xception: Server REPORT rejected; currently processing 
m0002028.ppops.net,60020,

1279237223465 as dead server
   at 
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage

r.java:217)
   at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve

rManager.java:271)
   at 
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav

a:684)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces

sorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
   at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:

919)

   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) 



YouAreDeadException with hbase

2010-07-16 Thread Jinsong Hu

Hi, There:
 I got some YouAreDeadException  with hbase. what can cause it ? I do 
notice between 5:49 to 5:53 ,
for 4 minutes, there is no log. This doesn't look like GC issue as I checked 
the GC log, the longest GC

is only 9.6 seconds.

Jimmy.


2010-07-16 05:49:26,805 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), 
Max=4
08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034, 
Miss=2177880,
Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss 
Ratio=99.95

254278182983%, Evicted/Run=NaN
2010-07-16 05:53:23,476 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472), 
Max=4
08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035, 
Miss=2177880,
Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss 
Ratio=99.95

250105857849%, Evicted/Run=NaN


2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session 
tim
ed out, have not heard from server in 240540ms for sessionid 
0x329c88039b0006c,

closing socket connection and attempting reconnect
2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
con

nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181
2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection
established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating 
sessio

n
2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnec
t to ZooKeeper service, session 0x329c88039b0006c has expired, closing 
socket co

nnection
2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session 
tim
ed out, have not heard from server in 240520ms for sessionid 
0x129c87a7f98007a,

closing socket connection and attempting reconnect


2010-07-16 05:53:39,090 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer
: Aborting region server serverName=m0002028.ppops.net,60020,1279237223465, 
load

=(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception
org.apache.hadoop.hbase.YouAreDeadException: 
org.apache.hadoop.hbase.YouAreDeadE
xception: Server REPORT rejected; currently processing 
m0002028.ppops.net,60020,

1279237223465 as dead server
   at 
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage

r.java:217)
   at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve

rManager.java:271)
   at 
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav

a:684)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces

sorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
   at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:

919)

   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) 



Re: Hanging regionservers

2010-07-16 Thread Luke Forehand
I was about to migrate to CHD3b2 but thought I would wait for a few replies 
before doing so.  I'll likely have the migration done over the weekend.

Java:
java version 1.6.0_16
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

OS:
CentOS release 5.5 (Final)
2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 
GNU/Linux

Thanks
Luke

On 7/16/10 12:45 PM, Stack st...@duboce.net wrote:

Yeah, go to cdh3b2 if you can.  If you can repro there, there's a few
fellas (other than us hbasers) who'd be real interested in your
problem.
St.Ack

On Fri, Jul 16, 2010 at 10:40 AM, Stack st...@duboce.net wrote:
 Each time you threaddump, its stuck in same way?

 I've not seen this dfsclient hangup before, not that I remember.  Let
 me ask some hdfs-heads.  Will be back to you.

 Any chance of your upping to CHD3b2, for your hadoop at least?  HDFS
 has a few dfsclient/ipc fixes -- though looking at them none seem to
 explicitly address your issue.

 Whats that jvm that you are running?  Can you do a java -version?
 Whats your OS?

 Thanks,
 St.Ack



 On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:
 Line 58 and line 79 are the threads that I found suspicious.

 http://pastebin.com/W1E2nCZq

 The other stack traces from the other two region servers look identical to 
 this one.  BTW - I have made the config changes per Ryan Rawson's suggestion 
 (thanks!) and I've processed ~7 GB of the 15 GB without hangup thus far so 
 I'm crossing my fingers.

 -Luke

 On 7/16/10 11:48 AM, Stack st...@duboce.net wrote:

 Would you mind pastebinning the stacktrace?  It doesn't looks like
 https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the
 below, an issue that HADOOP-5859 purportedly fixes -- I see you
 commented on it -- but our Todd thinks otherwise (He has a 'real' fix
 up in another issue that I currently can't put my finger on).
 St.Ack

 On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:

 I grepped yesterday's logs on all servers for Blocking updates and there 
 was no trace.  I believe I had encountered the blocking updates problem 
 earlier in the project but throttled down the import speed which seemed to 
 fix that.

 I just double checked and all three region servers were idle.  Something 
 interesting that I noticed however, was that each regionserver had a 
 particular ResponseProcessor thread running for a specific block, and that 
 thread was stuck in a running state during the entirety of the hang.  Also 
 a DataStreamer thread for the block associated with the ResponseProcessor 
 was in a wait state.  This makes me think that each server was stuck 
 operating on a specific block.

 ResponseProcessor for block blk_1926230463847049982_2694658 - Thread 
 t...@61160
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked sun.nio.ch.uti...@196fbfd0
- locked java.util.collections$unmodifiable...@7799fdbb
- locked sun.nio.ch.epollselectori...@1ee13d55
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399)

   Locked ownable synchronizers:
- None

 DataStreamer for file 
 /hbase/.logs/dn01.colo.networkedinsights.com,60020,127993084/hlog.dat.1279228611023
  block blk_1926230463847049982_2694658 - Thread t...@61158
   java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c
at java.lang.Object.wait(Native Method)
at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247)

   Locked ownable synchronizers:
- None






Re: YouAreDeadException with hbase

2010-07-16 Thread Stack
You'll see this if the server reports to the master after the master
has ruled it 'dead'.

Here is the code that produces the exception:

if (!isDead(serverName)) return;
String message = Server  + what +  rejected; currently processing  +
  serverName +  as dead server;
LOG.debug(message);
throw new YouAreDeadException(message);

Servers are on the 'dead' list if zk reports their session has
expired.  The master moves then to cleanup after the dead server and
process its logs.  If during this cleanup time the server reports in,
master will return the youaredead exception.

Usually the RS has lost its zk session but has yet to realize it.

St.Ack

On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com wrote:
 Hi, There:
  I got some YouAreDeadException  with hbase. what can cause it ? I do notice
 between 5:49 to 5:53 ,
 for 4 minutes, there is no log. This doesn't look like GC issue as I checked
 the GC log, the longest GC
 is only 9.6 seconds.

 Jimmy.


 2010-07-16 05:49:26,805 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
 che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472),
 Max=4
 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034,
 Miss=2177880,
 Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss
 Ratio=99.95
 254278182983%, Evicted/Run=NaN
 2010-07-16 05:53:23,476 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
 che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB (425113472),
 Max=4
 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035,
 Miss=2177880,
 Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss
 Ratio=99.95
 250105857849%, Evicted/Run=NaN

 
 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client session
 tim
 ed out, have not heard from server in 240540ms for sessionid
 0x329c88039b0006c,
 closing socket connection and attempting reconnect
 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening socket
 con
 nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181
 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket
 connection
 established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating
 sessio
 n
 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to
 reconnec
 t to ZooKeeper service, session 0x329c88039b0006c has expired, closing
 socket co
 nnection
 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client session
 tim
 ed out, have not heard from server in 240520ms for sessionid
 0x129c87a7f98007a,
 closing socket connection and attempting reconnect


 2010-07-16 05:53:39,090 FATAL
 org.apache.hadoop.hbase.regionserver.HRegionServer
 : Aborting region server serverName=m0002028.ppops.net,60020,1279237223465,
 load
 =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled exception
 org.apache.hadoop.hbase.YouAreDeadException:
 org.apache.hadoop.hbase.YouAreDeadE
 xception: Server REPORT rejected; currently processing
 m0002028.ppops.net,60020,
 1279237223465 as dead server
       at
 org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage
 r.java:217)
       at
 org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve
 rManager.java:271)
       at
 org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav
 a:684)
       at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
       at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
       at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
 919)

       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)



Re: YouAreDeadException with hbase

2010-07-16 Thread Jinsong Hu

Yes, the root cause seems to be the gap of 4 minutes between
2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I checked GC 
gc-hbase.log and don't see

4 minute gap in gc. I just wonder what could cause this large gap.

I also wonder if there is a configuration that I can do to avoid this long 
pause, or get around the problem

cause by this long pause.

Jimmy

--
From: Stack st...@duboce.net
Sent: Friday, July 16, 2010 11:44 AM
To: user@hbase.apache.org
Subject: Re: YouAreDeadException with hbase


You'll see this if the server reports to the master after the master
has ruled it 'dead'.

Here is the code that produces the exception:

   if (!isDead(serverName)) return;
   String message = Server  + what +  rejected; currently processing  
+

 serverName +  as dead server;
   LOG.debug(message);
   throw new YouAreDeadException(message);

Servers are on the 'dead' list if zk reports their session has
expired.  The master moves then to cleanup after the dead server and
process its logs.  If during this cleanup time the server reports in,
master will return the youaredead exception.

Usually the RS has lost its zk session but has yet to realize it.

St.Ack

On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com 
wrote:

Hi, There:
 I got some YouAreDeadException  with hbase. what can cause it ? I do 
notice

between 5:49 to 5:53 ,
for 4 minutes, there is no log. This doesn't look like GC issue as I 
checked

the GC log, the longest GC
is only 9.6 seconds.

Jimmy.


2010-07-16 05:49:26,805 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB 
(425113472),

Max=4
08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034,
Miss=2177880,
Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss
Ratio=99.95
254278182983%, Evicted/Run=NaN
2010-07-16 05:53:23,476 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB 
(425113472),

Max=4
08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035,
Miss=2177880,
Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss
Ratio=99.95
250105857849%, Evicted/Run=NaN


2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client 
session

tim
ed out, have not heard from server in 240540ms for sessionid
0x329c88039b0006c,
closing socket connection and attempting reconnect
2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket

con
nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181
2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket
connection
established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating
sessio
n
2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnec
t to ZooKeeper service, session 0x329c88039b0006c has expired, closing
socket co
nnection
2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client 
session

tim
ed out, have not heard from server in 240520ms for sessionid
0x129c87a7f98007a,
closing socket connection and attempting reconnect


2010-07-16 05:53:39,090 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer
: Aborting region server 
serverName=m0002028.ppops.net,60020,1279237223465,

load
=(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled 
exception

org.apache.hadoop.hbase.YouAreDeadException:
org.apache.hadoop.hbase.YouAreDeadE
xception: Server REPORT rejected; currently processing
m0002028.ppops.net,60020,
1279237223465 as dead server
  at
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage
r.java:217)
  at
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve
rManager.java:271)
  at
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav
a:684)
  at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)

  at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
919)

  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)





Re: High ingest rate and FIN_WAIT1 problems

2010-07-16 Thread Ryan Rawson
I've been running with this setting on both the HDFS side and the
HBase side for over a year now, it's a bit of voodoo but you might be
running into well known suckage of HDFS.  Try this one and restart
your hbase  hdfs.

The FIN_WAIT2/TIME_WAIT happens more on large concurrent gets, not so
much for inserts.

property
namedfs.datanode.socket.write.timeout/name
value0/value
/property

-ryan


On Fri, Jul 16, 2010 at 9:33 AM, Thomas Downing
tdown...@proteus-technologies.com wrote:
 Thanks for the response.

 My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2,
 my problem is with FIN_WAIT1.

 While I do see some sockets in TIME_WAIT, they are only a few, and the
 number is not growing.

 On 7/16/2010 12:07 PM, Hegner, Travis wrote:

 Hi Thomas,

 I ran into a very similar issue when running slony-I on postgresql to
 replicate 15-20 databases.

 Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow
 (or hopefully stop), the leaking sockets. I found some notes about adjusting
 TCP parameters here:
 http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html

 with the specific excerpt regarding the TCP_FIN_TIMEOUT:

 TCP_FIN_TIMEOUT
 This setting determines the time that must elapse before TCP/IP can
 release a closed connection and reuse its resources. During this TIME_WAIT
 state, reopening the connection to the client costs less than establishing a
 new connection. By reducing the value of this entry, TCP/IP can release
 closed connections faster, making more resources available for new
 connections. Addjust this in the presense of many connections sitting in the
 TIME_WAIT state:


 # echo 30  /proc/sys/net/ipv4/tcp_fin_timeout

 Try setting this lower on your master. You may also consider these on the
 same link:

 TCP_TW_RECYCLE
 It enables fast recycling of TIME_WAIT sockets. The default value is 0
 (disabled). The sysctl documentation incorrectly states the default as
 enabled. It can be changed to 1 (enabled) in many cases. Known to cause some
 issues with hoststated (load balancing and fail over) if enabled, should be
 used with caution.


 echo 1  /proc/sys/net/ipv4/tcp_tw_recycle

 TCP_TW_REUSE
 This allows reusing sockets in TIME_WAIT state for new connections when it
 is safe from protocol viewpoint. Default value is 0 (disabled). It is
 generally a safer alternative to tcp_tw_recycle


 echo 1  /proc/sys/net/ipv4/tcp_tw_reuse


 The above commands will not persist reboots, but the link explains how.
 The list experts may be able to give more insight on which, if any, of these
 settings are safe to manipulate, and what risks or issues you may encounter
 specifically with Hbase while adjusting these settings.

 Hope This Helps,

 Travis Hegner


 -Original Message-
 From: Thomas Downing [mailto:tdown...@proteus-technologies.com]
 Sent: Friday, July 16, 2010 10:33 AM
 To: user@hbase.apache.org
 Subject: High ingest rate and FIN_WAIT1 problems

 Hi,

 I am a complete HBase and HDFS newbie, so I apologize in advance for
 the inevitable bloopers.

 We are doing feasibility testing on NoSql data store options, with rather
 high ingest rate requirements.  So far, HBase is looking good, with only
 one issue identified. Running at an ingest rate of ~30K rows per second
 on a 4 2.2Mhz CPU 8G RAM machine I am slowly leaking sockets.

 This is a single node setup - no replication.  The CPU load is only about
 50%-60%, with the majority of that in userland, system and iowait are
 averaging less than 3%.  There is no swapping going on.

 The problem is that on the datanode there are a large number of sockets
 in FIN_WAIT1, with corresponding peers on master in ESTABLISHED.
 These pairs hang around for quite some time, at at my ingest rate this
 means that the total sockets held by datanode and master is slowly going
 up.

 If my understanding of TCP is correct, then this indicates that the master
 peer has stopped reading incoming data from the datanode - i.e, it is
 sending a window of zero; and that the datanode has called close(2) on
 it's peer.

 There was a thread some time ago:

 http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg03329.html

 There was no real conclusion.  I have played with the config params as
 suggested on that thread, but no luck yet.  Also, in that case the problem
 seemed to be between datanodes for replication operations - not the case
 with me.  Changing timeouts to avoid the slow increase might not really
 solve the problem IFF the master peer has in fact ceased to read it's
 socket.  The data outstanding in the TCP stack buffer would be lost.
 Whether that would imply data loss is beyond me.

 I am posting this here as although the only logs with errors/exceptions
 are the datanode logs, netstat and wireshark seem to indicate that the
 problem is on the master side.

 The master, namenode, regionserver and zookeeper and logs shows no
 warning or errors.  The datanode log shows this, over and over:

 2010-07-16 00:33:09,269 WARN
 

Re: YouAreDeadException with hbase

2010-07-16 Thread Ryan Rawson
Sometimes the GC can chain multiple medium pauses into one large
pause.  I've seen this before where there are 2 long pauses back to
back and the result was a 50second+ pause.

This article talks a lot about GC performance and tuning, check it out:
http://wiki.apache.org/hadoop/PerformanceTuning

-ryan

On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com wrote:
 Yes, the root cause seems to be the gap of 4 minutes between
 2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I checked GC
 gc-hbase.log and don't see
 4 minute gap in gc. I just wonder what could cause this large gap.

 I also wonder if there is a configuration that I can do to avoid this long
 pause, or get around the problem
 cause by this long pause.

 Jimmy

 --
 From: Stack st...@duboce.net
 Sent: Friday, July 16, 2010 11:44 AM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 You'll see this if the server reports to the master after the master
 has ruled it 'dead'.

 Here is the code that produces the exception:

   if (!isDead(serverName)) return;
   String message = Server  + what +  rejected; currently processing  +
     serverName +  as dead server;
   LOG.debug(message);
   throw new YouAreDeadException(message);

 Servers are on the 'dead' list if zk reports their session has
 expired.  The master moves then to cleanup after the dead server and
 process its logs.  If during this cleanup time the server reports in,
 master will return the youaredead exception.

 Usually the RS has lost its zk session but has yet to realize it.

 St.Ack

 On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 Hi, There:
  I got some YouAreDeadException  with hbase. what can cause it ? I do
 notice
 between 5:49 to 5:53 ,
 for 4 minutes, there is no log. This doesn't look like GC issue as I
 checked
 the GC log, the longest GC
 is only 9.6 seconds.

 Jimmy.


 2010-07-16 05:49:26,805 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
 che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB
 (425113472),
 Max=4
 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034,
 Miss=2177880,
 Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss
 Ratio=99.95
 254278182983%, Evicted/Run=NaN
 2010-07-16 05:53:23,476 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
 che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB
 (425113472),
 Max=4
 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035,
 Miss=2177880,
 Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss
 Ratio=99.95
 250105857849%, Evicted/Run=NaN

 
 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client
 session
 tim
 ed out, have not heard from server in 240540ms for sessionid
 0x329c88039b0006c,
 closing socket connection and attempting reconnect
 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening
 socket
 con
 nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181
 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket
 connection
 established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating
 sessio
 n
 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to
 reconnec
 t to ZooKeeper service, session 0x329c88039b0006c has expired, closing
 socket co
 nnection
 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client
 session
 tim
 ed out, have not heard from server in 240520ms for sessionid
 0x129c87a7f98007a,
 closing socket connection and attempting reconnect


 2010-07-16 05:53:39,090 FATAL
 org.apache.hadoop.hbase.regionserver.HRegionServer
 : Aborting region server
 serverName=m0002028.ppops.net,60020,1279237223465,
 load
 =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled
 exception
 org.apache.hadoop.hbase.YouAreDeadException:
 org.apache.hadoop.hbase.YouAreDeadE
 xception: Server REPORT rejected; currently processing
 m0002028.ppops.net,60020,
 1279237223465 as dead server
      at
 org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage
 r.java:217)
      at
 org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve
 rManager.java:271)
      at
 org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav
 a:684)
      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
      at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
      at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
 919)

      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)





Re: Row level locking?

2010-07-16 Thread Ryan Rawson
HTable.close does very little:

  public void close() throws IOException{
flushCommits();
  }


None of which involves row locks.

One thing to watch out for is to remember to close your scanners -
they continue to use server-side resources until you close them or 60
seconds passes and they get timed out.  Also be very wary of using any
of the explicit row locking calls, they are generally trouble for more
or less everyone.  There was a proposal to remove them, but I don't
think that went through.


On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:

 On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:



 Thanks for the response.
 (You don't need to include the cc ...)

 With respect to the row level locking ...
 I was interested in when the lock is actually acquired, how long the lock 
 persists and when is the lock released.
 From your response, the lock is only held on updating the row, and while the 
 data is being written to the memory cache which is then written to disk. 
 (Note: This row level locking different than transactional row level locking.)

 Now that I've had some caffeine I think I can clarify... :-)

 Some of my developers complained that they were having trouble with two 
 different processes trying to update the same table.
 Not sure why they were having the problem, so I wanted to have a good fix. 
 The simple fix was to have them issue the close() the HTable connection which 
 forces any resources that they acquired to be released.


 It would help to know what the exact problem was. Normally I wouldn't see any 
 problems.


 In looking at the problem... its possible that they didn't have AutoFlush set 
 to true so the write was still in the buffer and hadn't gotten flushed.

 If the lock only persists for the duration of the write to memory and is then 
 released, then the issue could have been that the record written was in the 
 buffer and not yet flushed to disk.


 At the region server level HBase will use the cache for both reads and 
 writes. This happens transparently for the user. Once something is written in 
 the cache, all other clients will read from the same cache. No need to worry 
 if the cache has been flushed.
 Lars George has a good article about the hbase storage architecture 
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

 I'm also assuming that when you run a scan() against a region that any 
 information written to buffer but not yet written to disk will be missed.


 When you do puts into hbase you'll use HTable. The HTable instance is on the 
 client.  HTable keeps a buffer as well and if autoFlush is false it only 
 flushes when you do flushCommits() or when it reaches the buffer limit, or 
 when you close the table. With autoFlush set to true it will flush for every 
 put.
 This buffer is on the client. So when data is actually flushed it gets on the 
 region server where it will get in the region server cache and WAL.
 Unless a client flushes the put no other client can see the data because it 
 still resides on the client only. Depending on what you need to do you can 
 use autoFlush true if you are doing many small writes that need to be seen 
 immediately by others. You can use autoFlush false and issue flushCommits() 
 yourself, or you can rely on the buffer limit for that.

 So I guess the question isn't so much the issue of a lock, but that we need 
 to make sure that data written to the buffer should be flushed ASAP unless we 
 know that we're going to be writing a lot of data in the m/r job.


 Usually when you write from the reducer (heavy) is better to use a buffer and 
 not autoFlush to have a good performance.

 Cosmin


 Thx

 -Mike



 From: cleh...@adobe.commailto:cleh...@adobe.com
 To: user@hbase.apache.orgmailto:user@hbase.apache.org
 CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
 Date: Fri, 16 Jul 2010 12:34:36 +0100
 Subject: Re: Row level locking?

 Currently a row is part of a region and there's a single region server 
 serving that region at a particular moment.
 So when that row is updated a lock is acquired for that row until the actual 
 data is updated in memory (note that a put will be written to cache on the 
 region server and also persisted in the write-ahead log - WAL). Subsequent 
 puts to that row will have to wait for that lock.

 HBase is fully consistent. This being said all the locking takes place at row 
 level only, so when you scan you have to take that into account as there's no 
 range locking.

 I'm not sure I understand the resource releasing issue. HTable.close() 
 flushes the current write buffer (you can have write buffer if you use 
 autoFlush set to false).

 Cosmin


 On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:


 Ok,

 First, I'm writing this before I've had my first cup of coffee so I am 
 apologizing in advance if the question is a brain dead question

 Going from a relational background, some of these questions may not 

Re: Row level locking?

2010-07-16 Thread Guilherme Germoglio
What about implementing explicit row locks using the zookeeper? I'm planning
to do this sometime in the near future. Does anyone have any comments
against this approach?

(or maybe it was already implemented by someone :-)

On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote:

 HTable.close does very little:

  public void close() throws IOException{
flushCommits();
  }


 None of which involves row locks.

 One thing to watch out for is to remember to close your scanners -
 they continue to use server-side resources until you close them or 60
 seconds passes and they get timed out.  Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.  There was a proposal to remove them, but I don't
 think that went through.


 On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:
 
  On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
 
 
 
  Thanks for the response.
  (You don't need to include the cc ...)
 
  With respect to the row level locking ...
  I was interested in when the lock is actually acquired, how long the lock
 persists and when is the lock released.
  From your response, the lock is only held on updating the row, and while
 the data is being written to the memory cache which is then written to disk.
 (Note: This row level locking different than transactional row level
 locking.)
 
  Now that I've had some caffeine I think I can clarify... :-)
 
  Some of my developers complained that they were having trouble with two
 different processes trying to update the same table.
  Not sure why they were having the problem, so I wanted to have a good
 fix. The simple fix was to have them issue the close() the HTable connection
 which forces any resources that they acquired to be released.
 
 
  It would help to know what the exact problem was. Normally I wouldn't see
 any problems.
 
 
  In looking at the problem... its possible that they didn't have AutoFlush
 set to true so the write was still in the buffer and hadn't gotten flushed.
 
  If the lock only persists for the duration of the write to memory and is
 then released, then the issue could have been that the record written was in
 the buffer and not yet flushed to disk.
 
 
  At the region server level HBase will use the cache for both reads and
 writes. This happens transparently for the user. Once something is written
 in the cache, all other clients will read from the same cache. No need to
 worry if the cache has been flushed.
  Lars George has a good article about the hbase storage architecture
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
 
  I'm also assuming that when you run a scan() against a region that any
 information written to buffer but not yet written to disk will be missed.
 
 
  When you do puts into hbase you'll use HTable. The HTable instance is on
 the client.  HTable keeps a buffer as well and if autoFlush is false it only
 flushes when you do flushCommits() or when it reaches the buffer limit, or
 when you close the table. With autoFlush set to true it will flush for every
 put.
  This buffer is on the client. So when data is actually flushed it gets on
 the region server where it will get in the region server cache and WAL.
  Unless a client flushes the put no other client can see the data because
 it still resides on the client only. Depending on what you need to do you
 can use autoFlush true if you are doing many small writes that need to be
 seen immediately by others. You can use autoFlush false and issue
 flushCommits() yourself, or you can rely on the buffer limit for that.
 
  So I guess the question isn't so much the issue of a lock, but that we
 need to make sure that data written to the buffer should be flushed ASAP
 unless we know that we're going to be writing a lot of data in the m/r job.
 
 
  Usually when you write from the reducer (heavy) is better to use a buffer
 and not autoFlush to have a good performance.
 
  Cosmin
 
 
  Thx
 
  -Mike
 
 
 
  From: cleh...@adobe.commailto:cleh...@adobe.com
  To: user@hbase.apache.orgmailto:user@hbase.apache.org
  CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
  Date: Fri, 16 Jul 2010 12:34:36 +0100
  Subject: Re: Row level locking?
 
  Currently a row is part of a region and there's a single region server
 serving that region at a particular moment.
  So when that row is updated a lock is acquired for that row until the
 actual data is updated in memory (note that a put will be written to cache
 on the region server and also persisted in the write-ahead log - WAL).
 Subsequent puts to that row will have to wait for that lock.
 
  HBase is fully consistent. This being said all the locking takes place at
 row level only, so when you scan you have to take that into account as
 there's no range locking.
 
  I'm not sure I understand the resource releasing issue. HTable.close()
 flushes the current write buffer (you can have 

Re: Row level locking?

2010-07-16 Thread Ryan Rawson
Explicit locks with zookeeper would be (a) slow and (b) completely out
of band and ultimately up to you.  I wouldn't exactly be eager to do
our row locking in zookeeper (since the minimum operation time is
between 2-10ms).

You could do application advisory locks, but that is true no matter
what datastore you use...

On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
germog...@gmail.com wrote:
 What about implementing explicit row locks using the zookeeper? I'm planning
 to do this sometime in the near future. Does anyone have any comments
 against this approach?

 (or maybe it was already implemented by someone :-)

 On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote:

 HTable.close does very little:

  public void close() throws IOException{
    flushCommits();
  }


 None of which involves row locks.

 One thing to watch out for is to remember to close your scanners -
 they continue to use server-side resources until you close them or 60
 seconds passes and they get timed out.  Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.  There was a proposal to remove them, but I don't
 think that went through.


 On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:
 
  On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
 
 
 
  Thanks for the response.
  (You don't need to include the cc ...)
 
  With respect to the row level locking ...
  I was interested in when the lock is actually acquired, how long the lock
 persists and when is the lock released.
  From your response, the lock is only held on updating the row, and while
 the data is being written to the memory cache which is then written to disk.
 (Note: This row level locking different than transactional row level
 locking.)
 
  Now that I've had some caffeine I think I can clarify... :-)
 
  Some of my developers complained that they were having trouble with two
 different processes trying to update the same table.
  Not sure why they were having the problem, so I wanted to have a good
 fix. The simple fix was to have them issue the close() the HTable connection
 which forces any resources that they acquired to be released.
 
 
  It would help to know what the exact problem was. Normally I wouldn't see
 any problems.
 
 
  In looking at the problem... its possible that they didn't have AutoFlush
 set to true so the write was still in the buffer and hadn't gotten flushed.
 
  If the lock only persists for the duration of the write to memory and is
 then released, then the issue could have been that the record written was in
 the buffer and not yet flushed to disk.
 
 
  At the region server level HBase will use the cache for both reads and
 writes. This happens transparently for the user. Once something is written
 in the cache, all other clients will read from the same cache. No need to
 worry if the cache has been flushed.
  Lars George has a good article about the hbase storage architecture
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
 
  I'm also assuming that when you run a scan() against a region that any
 information written to buffer but not yet written to disk will be missed.
 
 
  When you do puts into hbase you'll use HTable. The HTable instance is on
 the client.  HTable keeps a buffer as well and if autoFlush is false it only
 flushes when you do flushCommits() or when it reaches the buffer limit, or
 when you close the table. With autoFlush set to true it will flush for every
 put.
  This buffer is on the client. So when data is actually flushed it gets on
 the region server where it will get in the region server cache and WAL.
  Unless a client flushes the put no other client can see the data because
 it still resides on the client only. Depending on what you need to do you
 can use autoFlush true if you are doing many small writes that need to be
 seen immediately by others. You can use autoFlush false and issue
 flushCommits() yourself, or you can rely on the buffer limit for that.
 
  So I guess the question isn't so much the issue of a lock, but that we
 need to make sure that data written to the buffer should be flushed ASAP
 unless we know that we're going to be writing a lot of data in the m/r job.
 
 
  Usually when you write from the reducer (heavy) is better to use a buffer
 and not autoFlush to have a good performance.
 
  Cosmin
 
 
  Thx
 
  -Mike
 
 
 
  From: cleh...@adobe.commailto:cleh...@adobe.com
  To: user@hbase.apache.orgmailto:user@hbase.apache.org
  CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
  Date: Fri, 16 Jul 2010 12:34:36 +0100
  Subject: Re: Row level locking?
 
  Currently a row is part of a region and there's a single region server
 serving that region at a particular moment.
  So when that row is updated a lock is acquired for that row until the
 actual data is updated in memory (note that a put will be written to cache
 on the region server and also 

Re: YouAreDeadException with hbase

2010-07-16 Thread Ryan Rawson
These 2 lines are different GC collections:

5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K), 0.77
00840 secs]
5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs]
[Times: user=95.87 sys=3.06, real=310.97 secs]
(concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs] 20233
25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 , 281.03
57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

It's a little hard to read that, it looks like the CMS concurrent mark
took 310 seconds then failed, then we got a 281 second real time
pause, but interestingly enough the user and system time is fairly
low.

How loaded are these machines?  You need to be giving enough
uncontended CPU time to hbase.


On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com wrote:
 I already implemented all these configs before my test. I checked
 gc-hbase.log,

 I see a GC failure which looks very suspecious:

 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs]
 2016283K
 -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08
 sys=0.00, r
 eal=0.83 secs]

 5517.355: [GC 5517.355: [ParNew (promotion failed): 18113K-18113K(19136K),
 0.77
 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961
 secs] [
 Times: user=95.87 sys=3.06, real=310.97 secs]
 (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs]
 20233
 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 ,
 281.03
 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K),
 0.05792
 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs]


 the concurrent mode failure, and  281.05 seconds of GC time, looks like the
 culprit for the problem.
 I just wonder how to resolve this issue.

 Jimmy.


 --
 From: Ryan Rawson ryano...@gmail.com
 Sent: Friday, July 16, 2010 12:57 PM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 Sometimes the GC can chain multiple medium pauses into one large
 pause.  I've seen this before where there are 2 long pauses back to
 back and the result was a 50second+ pause.

 This article talks a lot about GC performance and tuning, check it out:
 http://wiki.apache.org/hadoop/PerformanceTuning

 -ryan

 On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 Yes, the root cause seems to be the gap of 4 minutes between
 2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I checked GC
 gc-hbase.log and don't see
 4 minute gap in gc. I just wonder what could cause this large gap.

 I also wonder if there is a configuration that I can do to avoid this
 long
 pause, or get around the problem
 cause by this long pause.

 Jimmy

 --
 From: Stack st...@duboce.net
 Sent: Friday, July 16, 2010 11:44 AM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 You'll see this if the server reports to the master after the master
 has ruled it 'dead'.

 Here is the code that produces the exception:

  if (!isDead(serverName)) return;
  String message = Server  + what +  rejected; currently processing 
 +
    serverName +  as dead server;
  LOG.debug(message);
  throw new YouAreDeadException(message);

 Servers are on the 'dead' list if zk reports their session has
 expired.  The master moves then to cleanup after the dead server and
 process its logs.  If during this cleanup time the server reports in,
 master will return the youaredead exception.

 Usually the RS has lost its zk session but has yet to realize it.

 St.Ack

 On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 Hi, There:
  I got some YouAreDeadException  with hbase. what can cause it ? I do
 notice
 between 5:49 to 5:53 ,
 for 4 minutes, there is no log. This doesn't look like GC issue as I
 checked
 the GC log, the longest GC
 is only 9.6 seconds.

 Jimmy.


 2010-07-16 05:49:26,805 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
 che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB
 (425113472),
 Max=4
 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034,
 Miss=2177880,
 Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss
 Ratio=99.95
 254278182983%, Evicted/Run=NaN
 2010-07-16 05:53:23,476 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
 che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB
 (425113472),
 Max=4
 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035,
 Miss=2177880,
 Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss
 Ratio=99.95
 250105857849%, Evicted/Run=NaN

 
 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client
 session
 tim
 ed out, have not heard from server in 240540ms for sessionid
 0x329c88039b0006c,
 closing socket connection and 

Re: YouAreDeadException with hbase

2010-07-16 Thread Ryan Rawson
One bit of advice that is easy to overlook but is important is thus:

This can be due to the size of the ParNew, which should be relatively
small. If your ParNew is very large after running HBase for a while,
in one example a ParNew was about 150MB, then you might have to
constrain the size of ParNew.

Add this to HBASE_OPTS:

export HBASE_OPTS=-XX:NewSize=6m -XX:MaxNewSize=6m cms options from
above gc logging options from above

From our wiki about performance tuning.

On Fri, Jul 16, 2010 at 1:50 PM, Jinsong Hu jinsong...@hotmail.com wrote:
 I was doing stress testing, so the load is not small. But I purposely
 limited the data rate on client
 side so load is not big either. using iostat -x 5 and I can see there are
 lots of situations that
 the CPU goes to very high level and stay there for long time. but then it
 ultimately go down.
 It is highly possible that during a certain period the CPU was too busy and
 the GC process
 was starved for CPU.

 I researched this failure and found

 an excellent thread talking about GC:

 http://forums.sun.com/thread.jspa?threadID=698490

 that is more detailed than http://wiki.apache.org/hadoop/PerformanceTuning .
 I will do some tunning and see if it helps following the posted config
 there.

 Jimmy.

 --
 From: Ryan Rawson ryano...@gmail.com
 Sent: Friday, July 16, 2010 1:35 PM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 These 2 lines are different GC collections:

 5517.355: [GC 5517.355: [ParNew (promotion failed):
 18113K-18113K(19136K), 0.77
 00840 secs]
 5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs]
 [Times: user=95.87 sys=3.06, real=310.97 secs]
 (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs]
 20233
 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 ,
 281.03
 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

 It's a little hard to read that, it looks like the CMS concurrent mark
 took 310 seconds then failed, then we got a 281 second real time
 pause, but interestingly enough the user and system time is fairly
 low.

 How loaded are these machines?  You need to be giving enough
 uncontended CPU time to hbase.


 On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 I already implemented all these configs before my test. I checked
 gc-hbase.log,

 I see a GC failure which looks very suspecious:

 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs]
 2016283K
 -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08
 sys=0.00, r
 eal=0.83 secs]

 5517.355: [GC 5517.355: [ParNew (promotion failed):
 18113K-18113K(19136K),
 0.77
 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961
 secs] [
 Times: user=95.87 sys=3.06, real=310.97 secs]
 (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs]
 20233
 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 ,
 281.03
 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K),
 0.05792
 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs]


 the concurrent mode failure, and  281.05 seconds of GC time, looks like
 the
 culprit for the problem.
 I just wonder how to resolve this issue.

 Jimmy.


 --
 From: Ryan Rawson ryano...@gmail.com
 Sent: Friday, July 16, 2010 12:57 PM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 Sometimes the GC can chain multiple medium pauses into one large
 pause.  I've seen this before where there are 2 long pauses back to
 back and the result was a 50second+ pause.

 This article talks a lot about GC performance and tuning, check it out:
 http://wiki.apache.org/hadoop/PerformanceTuning

 -ryan

 On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 Yes, the root cause seems to be the gap of 4 minutes between
 2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I checked
 GC
 gc-hbase.log and don't see
 4 minute gap in gc. I just wonder what could cause this large gap.

 I also wonder if there is a configuration that I can do to avoid this
 long
 pause, or get around the problem
 cause by this long pause.

 Jimmy

 --
 From: Stack st...@duboce.net
 Sent: Friday, July 16, 2010 11:44 AM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 You'll see this if the server reports to the master after the master
 has ruled it 'dead'.

 Here is the code that produces the exception:

  if (!isDead(serverName)) return;
  String message = Server  + what +  rejected; currently processing
 
 +
   serverName +  as dead server;
  LOG.debug(message);
  throw new YouAreDeadException(message);

 Servers are on the 'dead' list if zk reports their session has
 expired.  The master moves then to cleanup 

Re: Row level locking?

2010-07-16 Thread Guilherme Germoglio
thanks Ryan! (I was about to look for performance numbers)

Just another question -- slightly related to locks. Will HBase 0.90
include HTable.checkAndPut receiving more than one value to check? I'm
eager to help, if possible.

On Fri, Jul 16, 2010 at 5:58 PM, Guilherme Germoglio
germog...@gmail.com wrote:

 thanks Ryan! (I was about to look for performance numbers)
 Just another question -- slightly related to locks. Will HBase 0.90 include 
 HTable.checkAndPut receiving more than one value to check? I'm eager to help, 
 if possible.
 On Fri, Jul 16, 2010 at 5:24 PM, Ryan Rawson ryano...@gmail.com wrote:

 Explicit locks with zookeeper would be (a) slow and (b) completely out
 of band and ultimately up to you.  I wouldn't exactly be eager to do
 our row locking in zookeeper (since the minimum operation time is
 between 2-10ms).

 You could do application advisory locks, but that is true no matter
 what datastore you use...

 On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
 germog...@gmail.com wrote:
  What about implementing explicit row locks using the zookeeper? I'm 
  planning
  to do this sometime in the near future. Does anyone have any comments
  against this approach?
 
  (or maybe it was already implemented by someone :-)
 
  On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson ryano...@gmail.com wrote:
 
  HTable.close does very little:
 
   public void close() throws IOException{
     flushCommits();
   }
 
 
  None of which involves row locks.
 
  One thing to watch out for is to remember to close your scanners -
  they continue to use server-side resources until you close them or 60
  seconds passes and they get timed out.  Also be very wary of using any
  of the explicit row locking calls, they are generally trouble for more
  or less everyone.  There was a proposal to remove them, but I don't
  think that went through.
 
 
  On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene cleh...@adobe.com wrote:
  
   On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
  
  
  
   Thanks for the response.
   (You don't need to include the cc ...)
  
   With respect to the row level locking ...
   I was interested in when the lock is actually acquired, how long the 
   lock
  persists and when is the lock released.
   From your response, the lock is only held on updating the row, and while
  the data is being written to the memory cache which is then written to 
  disk.
  (Note: This row level locking different than transactional row level
  locking.)
  
   Now that I've had some caffeine I think I can clarify... :-)
  
   Some of my developers complained that they were having trouble with two
  different processes trying to update the same table.
   Not sure why they were having the problem, so I wanted to have a good
  fix. The simple fix was to have them issue the close() the HTable 
  connection
  which forces any resources that they acquired to be released.
  
  
   It would help to know what the exact problem was. Normally I wouldn't 
   see
  any problems.
  
  
   In looking at the problem... its possible that they didn't have 
   AutoFlush
  set to true so the write was still in the buffer and hadn't gotten 
  flushed.
  
   If the lock only persists for the duration of the write to memory and is
  then released, then the issue could have been that the record written was 
  in
  the buffer and not yet flushed to disk.
  
  
   At the region server level HBase will use the cache for both reads and
  writes. This happens transparently for the user. Once something is written
  in the cache, all other clients will read from the same cache. No need to
  worry if the cache has been flushed.
   Lars George has a good article about the hbase storage architecture
  http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  
   I'm also assuming that when you run a scan() against a region that any
  information written to buffer but not yet written to disk will be missed.
  
  
   When you do puts into hbase you'll use HTable. The HTable instance is on
  the client.  HTable keeps a buffer as well and if autoFlush is false it 
  only
  flushes when you do flushCommits() or when it reaches the buffer limit, or
  when you close the table. With autoFlush set to true it will flush for 
  every
  put.
   This buffer is on the client. So when data is actually flushed it gets 
   on
  the region server where it will get in the region server cache and WAL.
   Unless a client flushes the put no other client can see the data because
  it still resides on the client only. Depending on what you need to do you
  can use autoFlush true if you are doing many small writes that need to be
  seen immediately by others. You can use autoFlush false and issue
  flushCommits() yourself, or you can rely on the buffer limit for that.
  
   So I guess the question isn't so much the issue of a lock, but that we
  need to make sure that data written to the buffer should be flushed ASAP
  unless we know that we're going to be 

Re: YouAreDeadException with hbase

2010-07-16 Thread Jinsong Hu


Yes, I noticed that too. I was wondering if I should put it there because I 
see the ParNew was 13M after the regionserver started.

so 6M is really small.  if I set it to 6M, will that cause problem ?

I noticed that when the regionserver crashed, the ParNew is 18M. compared to 
the beginning 13M, it hasn't really grown that much.



Jimmy
--
From: Ryan Rawson ryano...@gmail.com
Sent: Friday, July 16, 2010 1:54 PM
To: user@hbase.apache.org
Subject: Re: YouAreDeadException with hbase


One bit of advice that is easy to overlook but is important is thus:

This can be due to the size of the ParNew, which should be relatively
small. If your ParNew is very large after running HBase for a while,
in one example a ParNew was about 150MB, then you might have to
constrain the size of ParNew.

Add this to HBASE_OPTS:

export HBASE_OPTS=-XX:NewSize=6m -XX:MaxNewSize=6m cms options from
above gc logging options from above

From our wiki about performance tuning.

On Fri, Jul 16, 2010 at 1:50 PM, Jinsong Hu jinsong...@hotmail.com 
wrote:

I was doing stress testing, so the load is not small. But I purposely
limited the data rate on client
side so load is not big either. using iostat -x 5 and I can see there 
are

lots of situations that
the CPU goes to very high level and stay there for long time. but then it
ultimately go down.
It is highly possible that during a certain period the CPU was too busy 
and

the GC process
was starved for CPU.

I researched this failure and found

an excellent thread talking about GC:

http://forums.sun.com/thread.jspa?threadID=698490

that is more detailed than 
http://wiki.apache.org/hadoop/PerformanceTuning .

I will do some tunning and see if it helps following the posted config
there.

Jimmy.

--
From: Ryan Rawson ryano...@gmail.com
Sent: Friday, July 16, 2010 1:35 PM
To: user@hbase.apache.org
Subject: Re: YouAreDeadException with hbase


These 2 lines are different GC collections:

5517.355: [GC 5517.355: [ParNew (promotion failed):
18113K-18113K(19136K), 0.77
00840 secs]
5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs]
[Times: user=95.87 sys=3.06, real=310.97 secs]
(concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 
secs]

20233
25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 
,

281.03
57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

It's a little hard to read that, it looks like the CMS concurrent mark
took 310 seconds then failed, then we got a 281 second real time
pause, but interestingly enough the user and system time is fairly
low.

How loaded are these machines?  You need to be giving enough
uncontended CPU time to hbase.


On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com
wrote:


I already implemented all these configs before my test. I checked
gc-hbase.log,

I see a GC failure which looks very suspecious:

5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs]
2016283K
-2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08
sys=0.00, r
eal=0.83 secs]

5517.355: [GC 5517.355: [ParNew (promotion failed):
18113K-18113K(19136K),
0.77
00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 
171.151/310.961

secs] [
Times: user=95.87 sys=3.06, real=310.97 secs]
(concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 
secs]

20233
25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 
,

281.03
57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 
573896K(2092928K),

0.05792
20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs]


the concurrent mode failure, and  281.05 seconds of GC time, looks like
the
culprit for the problem.
I just wonder how to resolve this issue.

Jimmy.


--
From: Ryan Rawson ryano...@gmail.com
Sent: Friday, July 16, 2010 12:57 PM
To: user@hbase.apache.org
Subject: Re: YouAreDeadException with hbase


Sometimes the GC can chain multiple medium pauses into one large
pause.  I've seen this before where there are 2 long pauses back to
back and the result was a 50second+ pause.

This article talks a lot about GC performance and tuning, check it 
out:

http://wiki.apache.org/hadoop/PerformanceTuning

-ryan

On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com
wrote:


Yes, the root cause seems to be the gap of 4 minutes between
2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I 
checked

GC
gc-hbase.log and don't see
4 minute gap in gc. I just wonder what could cause this large gap.

I also wonder if there is a configuration that I can do to avoid this
long
pause, or get around the problem
cause by this long pause.

Jimmy

--
From: Stack st...@duboce.net
Sent: Friday, July 16, 2010 11:44 AM
To: user@hbase.apache.org
Subject: Re: 

Re: YouAreDeadException with hbase

2010-07-16 Thread Stack
Are you swapping?

The article you cite is from 2006, 5 years ago.  As is noted at the
end of that thread, the JVMs change over time; many of the cited
configs. may no longer exist or function.

St.Ack


On Fri, Jul 16, 2010 at 1:50 PM, Jinsong Hu jinsong...@hotmail.com wrote:
 I was doing stress testing, so the load is not small. But I purposely
 limited the data rate on client
 side so load is not big either. using iostat -x 5 and I can see there are
 lots of situations that
 the CPU goes to very high level and stay there for long time. but then it
 ultimately go down.
 It is highly possible that during a certain period the CPU was too busy and
 the GC process
 was starved for CPU.

 I researched this failure and found

 an excellent thread talking about GC:

 http://forums.sun.com/thread.jspa?threadID=698490

 that is more detailed than http://wiki.apache.org/hadoop/PerformanceTuning .
 I will do some tunning and see if it helps following the posted config
 there.

 Jimmy.

 --
 From: Ryan Rawson ryano...@gmail.com
 Sent: Friday, July 16, 2010 1:35 PM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 These 2 lines are different GC collections:

 5517.355: [GC 5517.355: [ParNew (promotion failed):
 18113K-18113K(19136K), 0.77
 00840 secs]
 5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961 secs]
 [Times: user=95.87 sys=3.06, real=310.97 secs]
 (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs]
 20233
 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 ,
 281.03
 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

 It's a little hard to read that, it looks like the CMS concurrent mark
 took 310 seconds then failed, then we got a 281 second real time
 pause, but interestingly enough the user and system time is fairly
 low.

 How loaded are these machines?  You need to be giving enough
 uncontended CPU time to hbase.


 On Fri, Jul 16, 2010 at 1:30 PM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 I already implemented all these configs before my test. I checked
 gc-hbase.log,

 I see a GC failure which looks very suspecious:

 5515.974: [GC 5515.974: [ParNew: 19120K-2112K(19136K), 0.8344240 secs]
 2016283K
 -2007324K(2069308K) icms_dc=100 , 0.8345620 secs] [Times: user=0.08
 sys=0.00, r
 eal=0.83 secs]

 5517.355: [GC 5517.355: [ParNew (promotion failed):
 18113K-18113K(19136K),
 0.77
 00840 secs]5518.125: [CMS5649.813: [CMS-concurrent-mark: 171.151/310.961
 secs] [
 Times: user=95.87 sys=3.06, real=310.97 secs]
 (concurrent mode failure): 2009649K-572240K(2054592K), 280.2155930 secs]
 20233
 25K-572240K(2073728K), [CMS Perm : 18029K-17976K(30064K)] icms_dc=100 ,
 281.03
 57280 secs] [Times: user=4.55 sys=4.07, real=281.05 secs]

 5798.909: [GC [1 CMS-initial-mark: 572240K(2054592K)] 573896K(2092928K),
 0.05792
 20 secs] [Times: user=0.01 sys=0.00, real=0.08 secs]


 the concurrent mode failure, and  281.05 seconds of GC time, looks like
 the
 culprit for the problem.
 I just wonder how to resolve this issue.

 Jimmy.


 --
 From: Ryan Rawson ryano...@gmail.com
 Sent: Friday, July 16, 2010 12:57 PM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 Sometimes the GC can chain multiple medium pauses into one large
 pause.  I've seen this before where there are 2 long pauses back to
 back and the result was a 50second+ pause.

 This article talks a lot about GC performance and tuning, check it out:
 http://wiki.apache.org/hadoop/PerformanceTuning

 -ryan

 On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu jinsong...@hotmail.com
 wrote:

 Yes, the root cause seems to be the gap of 4 minutes between
 2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I checked
 GC
 gc-hbase.log and don't see
 4 minute gap in gc. I just wonder what could cause this large gap.

 I also wonder if there is a configuration that I can do to avoid this
 long
 pause, or get around the problem
 cause by this long pause.

 Jimmy

 --
 From: Stack st...@duboce.net
 Sent: Friday, July 16, 2010 11:44 AM
 To: user@hbase.apache.org
 Subject: Re: YouAreDeadException with hbase

 You'll see this if the server reports to the master after the master
 has ruled it 'dead'.

 Here is the code that produces the exception:

  if (!isDead(serverName)) return;
  String message = Server  + what +  rejected; currently processing
 
 +
   serverName +  as dead server;
  LOG.debug(message);
  throw new YouAreDeadException(message);

 Servers are on the 'dead' list if zk reports their session has
 expired.  The master moves then to cleanup after the dead server and
 process its logs.  If during this cleanup time the server reports in,
 master will return the youaredead exception.

 Usually the RS has lost its zk session but has yet to realize it.

 St.Ack

 On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu 

Re: Row level locking?

2010-07-16 Thread Stack
On Fri, Jul 16, 2010 at 2:01 PM, Guilherme Germoglio
germog...@gmail.com wrote:
 Just another question -- slightly related to locks. Will HBase 0.90
 include HTable.checkAndPut receiving more than one value to check? I'm
 eager to help, if possible.


I don't think there is even an issue to add that facility Guilherme.
Make one, stick up a patch and we'll add it.

Good on you,
St.Ack


Re: Row level locking?

2010-07-16 Thread Ryan Rawson
In the uncontended case this is fine, although you are doing 4 RPCs to
accomplish what could be done in 1 (with CAS).

But in the contended case, all the people waiting on that lock consume
RPC handler threads eventually causing a temporary deadlock since the
original lockholder will not be able to progress to release the lock.
The 60 second release will kick in and things might flow again for a
bit.



On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohen justin.co...@teamaol.com wrote:
 What kind of trouble?  I do quite a bit of:

    l = lock(row);
    val = get(row);
    /* modify val */
    put(row, val);
    unlock(l);

 Is there an alternative?

 -justin

 On 7/16/10 4:02 PM, Ryan Rawson wrote:

 Also be very wary of using any
 of the explicit row locking calls, they are generally trouble for more
 or less everyone.



Re: Row level locking?

2010-07-16 Thread Justin Cohen
In that case it would be 2 RPC, right?  do { get, update, checkAndPut } 
while (ret = false)?  Plus 2 for each contention?


Thanks,
-justin

On 7/16/10 5:09 PM, Ryan Rawson wrote:

In the uncontended case this is fine, although you are doing 4 RPCs to
accomplish what could be done in 1 (with CAS).

But in the contended case, all the people waiting on that lock consume
RPC handler threads eventually causing a temporary deadlock since the
original lockholder will not be able to progress to release the lock.
The 60 second release will kick in and things might flow again for a
bit.



On Fri, Jul 16, 2010 at 2:07 PM, Justin Cohenjustin.co...@teamaol.com  wrote:
   

What kind of trouble?  I do quite a bit of:

l = lock(row);
val = get(row);
/* modify val */
put(row, val);
unlock(l);

Is there an alternative?

-justin

On 7/16/10 4:02 PM, Ryan Rawson wrote:
 

Also be very wary of using any
of the explicit row locking calls, they are generally trouble for more
or less everyone.
   
 


Re: Hanging regionservers

2010-07-16 Thread Luke Forehand
Using Ryan Rawson's suggested config tweaks, we have just completed a 
successful job run with a 15GB sequence file, no hang.  I'm setting up to have 
multiple files process this weekend with the new settings.  :-)  I believe the 
dfs socket write timeout being indefinite was the trick.

I'll post my results on Monday.  Thanks for the support thus far!

-Luke

On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote:

I'm not seeing anything in that logfile, you are seeing compactions
for various regions, but im not seeing flushes (typical during insert
loads) and nothing else. One thing we look to see is a log message
Blocking updates which indicates that a particular region has
decided it's holding up to prevent taking too many inserts.

Like I said, you could be seeing this on a different regionserver, if
all the clients are blocked on 1 regionserver and can't get to the
others then most will look idle and only one will actually show
anything interesting in the log.

Can you check for this behaviour?

Also if you want to tweak the config with the values I pasted that should help.

On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand
luke.foreh...@networkedinsights.com wrote:
 It looks like we are going straight from the default config, no expicit 
 setting of anything.

 On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote:

 In this case the regionserver isn't actually doing anything - all the
 IPC thread handlers are waiting in their queue handoff thingy (how
 they get socket/work to do).

 Something elsewhere perhaps?  Check the logs of your jobs, there might
 be something interesting there.

 One thing that frequently happens is you overrun 1 regionserver with
 edits and it isnt flushing fast enough, so it pauses updates and all
 clients end up stuck on it.

 What was that config again?  I use these settings:

 property
  namehbase.hstore.blockingStoreFiles/name
  value15/value
 /property

 property
  namedfs.datanode.socket.write.timeout/name
  value0/value
 /property

 property
  namehbase.hregion.memstore.block.multiplier/name
  value8/value
 /property

 perhaps try these ones?

 -ryan


Stargate returns single value cell query result in encode format

2010-07-16 Thread Steve Kuo
I am investigation Stargate to provide REST access and ran into an
interesting quesiton.  I have a simple test table and when I scan it within
the hbase shell, I got the following result.

hbase(main):028:0 scan 'test'
ROW
COLUMN+CELL

 row1column=data:1, timestamp=1278703763205,
value=value1
 row1column=data:a, timestamp=1278969106700,
value=a
 row2column=data:2, timestamp=1278703772510,
value=value2
 row2column=data:3, timestamp=1278703778936,
value=value3
 row3column=data:1, timestamp=1279318319309,
value=row3
3 row(s) in 0.0610 seconds

Issuing http://localhost:58080/test/row* returns

CellSet
Row key=cm93MQ==
Cell timestamp=1278703763205 column=ZGF0YToxdmFsdWUx/Cell
Cell timestamp=1278969106700 column=ZGF0YTphYQ==/Cell
/Row
Row key=cm93Mg==
Cell timestamp=1278703772510 column=ZGF0YToydmFsdWUy/Cell
Cell timestamp=1278703778936 column=ZGF0YTozdmFsdWUz/Cell
/Row
Row key=cm93Mw==
Cell timestamp=1279318319309 column=ZGF0YToxcm93Mw==/Cell
/Row
/CellSet

Why are the row key, column and value data encoded?  How can they be return
as text?

Thanks in advance.


Re: Hanging regionservers

2010-07-16 Thread Stack
So, it seems like you are by-passing issue by having no time out on
the socket.  Would be for sure interested though if you have the issue
still on cdh3b2.  Most folks will not be running with no socket
timeout.

Thanks Luke.
St.Ack


On Fri, Jul 16, 2010 at 3:01 PM, Luke Forehand
luke.foreh...@networkedinsights.com wrote:
 Using Ryan Rawson's suggested config tweaks, we have just completed a 
 successful job run with a 15GB sequence file, no hang.  I'm setting up to 
 have multiple files process this weekend with the new settings.  :-)  I 
 believe the dfs socket write timeout being indefinite was the trick.

 I'll post my results on Monday.  Thanks for the support thus far!

 -Luke

 On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote:

 I'm not seeing anything in that logfile, you are seeing compactions
 for various regions, but im not seeing flushes (typical during insert
 loads) and nothing else. One thing we look to see is a log message
 Blocking updates which indicates that a particular region has
 decided it's holding up to prevent taking too many inserts.

 Like I said, you could be seeing this on a different regionserver, if
 all the clients are blocked on 1 regionserver and can't get to the
 others then most will look idle and only one will actually show
 anything interesting in the log.

 Can you check for this behaviour?

 Also if you want to tweak the config with the values I pasted that should 
 help.

 On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:
 It looks like we are going straight from the default config, no expicit 
 setting of anything.

 On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote:

 In this case the regionserver isn't actually doing anything - all the
 IPC thread handlers are waiting in their queue handoff thingy (how
 they get socket/work to do).

 Something elsewhere perhaps?  Check the logs of your jobs, there might
 be something interesting there.

 One thing that frequently happens is you overrun 1 regionserver with
 edits and it isnt flushing fast enough, so it pauses updates and all
 clients end up stuck on it.

 What was that config again?  I use these settings:

 property
  namehbase.hstore.blockingStoreFiles/name
  value15/value
 /property

 property
  namedfs.datanode.socket.write.timeout/name
  value0/value
 /property

 property
  namehbase.hregion.memstore.block.multiplier/name
  value8/value
 /property

 perhaps try these ones?

 -ryan



Re: Hanging regionservers

2010-07-16 Thread Ryan Rawson
According to Todd, there is some kind of weird Thread coordination
issue which is worked around by setting the timeout to 0, even though
we actually arent hitting any timeouts in the failure case.

And it might have been fixed in cdh3.  I haven't had chance to run it
yet so I can't say.

-ryan

On Fri, Jul 16, 2010 at 3:32 PM, Stack st...@duboce.net wrote:
 So, it seems like you are by-passing issue by having no time out on
 the socket.  Would be for sure interested though if you have the issue
 still on cdh3b2.  Most folks will not be running with no socket
 timeout.

 Thanks Luke.
 St.Ack


 On Fri, Jul 16, 2010 at 3:01 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:
 Using Ryan Rawson's suggested config tweaks, we have just completed a 
 successful job run with a 15GB sequence file, no hang.  I'm setting up to 
 have multiple files process this weekend with the new settings.  :-)  I 
 believe the dfs socket write timeout being indefinite was the trick.

 I'll post my results on Monday.  Thanks for the support thus far!

 -Luke

 On 7/15/10 10:17 PM, Ryan Rawson ryano...@gmail.com wrote:

 I'm not seeing anything in that logfile, you are seeing compactions
 for various regions, but im not seeing flushes (typical during insert
 loads) and nothing else. One thing we look to see is a log message
 Blocking updates which indicates that a particular region has
 decided it's holding up to prevent taking too many inserts.

 Like I said, you could be seeing this on a different regionserver, if
 all the clients are blocked on 1 regionserver and can't get to the
 others then most will look idle and only one will actually show
 anything interesting in the log.

 Can you check for this behaviour?

 Also if you want to tweak the config with the values I pasted that should 
 help.

 On Thu, Jul 15, 2010 at 7:25 PM, Luke Forehand
 luke.foreh...@networkedinsights.com wrote:
 It looks like we are going straight from the default config, no expicit 
 setting of anything.

 On 7/15/10 9:03 PM, Ryan Rawson ryano...@gmail.com wrote:

 In this case the regionserver isn't actually doing anything - all the
 IPC thread handlers are waiting in their queue handoff thingy (how
 they get socket/work to do).

 Something elsewhere perhaps?  Check the logs of your jobs, there might
 be something interesting there.

 One thing that frequently happens is you overrun 1 regionserver with
 edits and it isnt flushing fast enough, so it pauses updates and all
 clients end up stuck on it.

 What was that config again?  I use these settings:

 property
  namehbase.hstore.blockingStoreFiles/name
  value15/value
 /property

 property
  namedfs.datanode.socket.write.timeout/name
  value0/value
 /property

 property
  namehbase.hregion.memstore.block.multiplier/name
  value8/value
 /property

 perhaps try these ones?

 -ryan




Re: Row level locking?

2010-07-16 Thread Patrick Hunt
Fine grain locking is not a good use case for ZooKeeper given it's 
quorum based architecture.


Patrick

On 07/16/2010 01:24 PM, Ryan Rawson wrote:

Explicit locks with zookeeper would be (a) slow and (b) completely out
of band and ultimately up to you.  I wouldn't exactly be eager to do
our row locking in zookeeper (since the minimum operation time is
between 2-10ms).

You could do application advisory locks, but that is true no matter
what datastore you use...

On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
germog...@gmail.com  wrote:

What about implementing explicit row locks using the zookeeper? I'm planning
to do this sometime in the near future. Does anyone have any comments
against this approach?

(or maybe it was already implemented by someone :-)

On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawsonryano...@gmail.com  wrote:


HTable.close does very little:

  public void close() throws IOException{
flushCommits();
  }


None of which involves row locks.

One thing to watch out for is to remember to close your scanners -
they continue to use server-side resources until you close them or 60
seconds passes and they get timed out.  Also be very wary of using any
of the explicit row locking calls, they are generally trouble for more
or less everyone.  There was a proposal to remove them, but I don't
think that went through.


On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehenecleh...@adobe.com  wrote:


On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:



Thanks for the response.
(You don't need to include the cc ...)

With respect to the row level locking ...
I was interested in when the lock is actually acquired, how long the lock

persists and when is the lock released.

 From your response, the lock is only held on updating the row, and while

the data is being written to the memory cache which is then written to disk.
(Note: This row level locking different than transactional row level
locking.)


Now that I've had some caffeine I think I can clarify... :-)

Some of my developers complained that they were having trouble with two

different processes trying to update the same table.

Not sure why they were having the problem, so I wanted to have a good

fix. The simple fix was to have them issue the close() the HTable connection
which forces any resources that they acquired to be released.



It would help to know what the exact problem was. Normally I wouldn't see

any problems.



In looking at the problem... its possible that they didn't have AutoFlush

set to true so the write was still in the buffer and hadn't gotten flushed.


If the lock only persists for the duration of the write to memory and is

then released, then the issue could have been that the record written was in
the buffer and not yet flushed to disk.



At the region server level HBase will use the cache for both reads and

writes. This happens transparently for the user. Once something is written
in the cache, all other clients will read from the same cache. No need to
worry if the cache has been flushed.

Lars George has a good article about the hbase storage architecture

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html


I'm also assuming that when you run a scan() against a region that any

information written to buffer but not yet written to disk will be missed.



When you do puts into hbase you'll use HTable. The HTable instance is on

the client.  HTable keeps a buffer as well and if autoFlush is false it only
flushes when you do flushCommits() or when it reaches the buffer limit, or
when you close the table. With autoFlush set to true it will flush for every
put.

This buffer is on the client. So when data is actually flushed it gets on

the region server where it will get in the region server cache and WAL.

Unless a client flushes the put no other client can see the data because

it still resides on the client only. Depending on what you need to do you
can use autoFlush true if you are doing many small writes that need to be
seen immediately by others. You can use autoFlush false and issue
flushCommits() yourself, or you can rely on the buffer limit for that.


So I guess the question isn't so much the issue of a lock, but that we

need to make sure that data written to the buffer should be flushed ASAP
unless we know that we're going to be writing a lot of data in the m/r job.



Usually when you write from the reducer (heavy) is better to use a buffer

and not autoFlush to have a good performance.


Cosmin


Thx

-Mike



From: cleh...@adobe.commailto:cleh...@adobe.com
To: user@hbase.apache.orgmailto:user@hbase.apache.org
CC: hbase-u...@hadoop.apache.orgmailto:hbase-u...@hadoop.apache.org
Date: Fri, 16 Jul 2010 12:34:36 +0100
Subject: Re: Row level locking?

Currently a row is part of a region and there's a single region server

serving that region at a particular moment.

So when that row is updated a lock is acquired for that row until the

actual data is updated in memory (note that a put