Re: High ingest rate and FIN_WAIT1 problems

2010-07-19 Thread Thomas Downing

Thanks for the response, but my problem is not with FIN_WAIT2, it
is with FIN_WAIT1.

If it was FIN_WAIT2, the only concern would be socket leakage,
and if  setting the time out solved the issue, that would be great.

The problem with FIN_WAIT1 is twofold - first, it is incumbent on
the application to notice and handle this problem; from the TCP stack
point of view, there is nothing wrong.  It is just a special case of slow
consumer.  The other problem is that it implies that something will be
lost if the socket is abandoned, there is data in the send queue of the
socket in FIN_WAIT1 that has not yet been delivered to the peer.

On 7/16/2010 3:56 PM, Ryan Rawson wrote:

I've been running with this setting on both the HDFS side and the
HBase side for over a year now, it's a bit of voodoo but you might be
running into well known suckage of HDFS.  Try this one and restart
your hbase  hdfs.

The FIN_WAIT2/TIME_WAIT happens more on large concurrent gets, not so
much for inserts.

property
namedfs.datanode.socket.write.timeout/name
value0/value
/property

-ryan


On Fri, Jul 16, 2010 at 9:33 AM, Thomas Downing
tdown...@proteus-technologies.com  wrote:
   

Thanks for the response.

My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2,
my problem is with FIN_WAIT1.

While I do see some sockets in TIME_WAIT, they are only a few, and the
number is not growing.

On 7/16/2010 12:07 PM, Hegner, Travis wrote:
 

Hi Thomas,

I ran into a very similar issue when running slony-I on postgresql to
replicate 15-20 databases.

Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow
(or hopefully stop), the leaking sockets. I found some notes about adjusting
TCP parameters here:
http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html

   

[snip]


Run MR job when my data stays in hbase?

2010-07-19 Thread elton sky
I am new to hbase.
I am going to apply hbase in our application. We are trying to store
customer info in hbase. And periodically we need to run some background
processing, e.g. a map reduce job, on customer data and write result back to
hbase.

My question is if I wanna run the backgroup process as a MR job, can I get
data from hbase, rather than hdfs, with hadoop? How do I do that?
I appreciate if anyone can provide some simple example code.

cheers
Elton


Re: Run MR job when my data stays in hbase?

2010-07-19 Thread Andrey Stepachev
2010/7/19 elton sky eltonsky9...@gmail.com:

 My question is if I wanna run the backgroup process as a MR job, can I get
 data from hbase, rather than hdfs, with hadoop? How do I do that?
 I appreciate if anyone can provide some simple example code.

Look at org.apache.hadoop.hbase.mapreduce package in hbase sources
and as real example: org.apache.hadoop.hbase.mapreduce.RowCounter


Re: Hanging regionservers

2010-07-19 Thread Ted Yu
https://issues.apache.org/jira/browse/HBASE-2248 is fixed in hbase 0.20.4
and beyond.
Upgrading to cdh3b2 should fix that issue.

On Mon, Jul 19, 2010 at 8:55 AM, Luke Forehand 
luke.foreh...@networkedinsights.com wrote:

 After looking at the stacktrace on regionserver2 this morning, I seem to be
 experiencing this issue:

 https://issues.apache.org/jira/browse/HBASE-2322

 Two questions:  Would this issue cause the primary issue of all my region
 servers appearing to hang, and will migrating to cdh3b2 fix this issue?

 Thanks
 Luke

 On 7/19/10 12:24 AM, Luke Forehand luke.foreh...@networkedinsights.com
 wrote:

 Here are pastebin's of my stacktraces and logs.  Note my comment below
 these links.

 regionserver 1 stack trace: http://pastebin.com/0n9cDeYh
 regionserver 2 stack trace: http://pastebin.com/8Sppp68h
 regionserver 3 stack trace: http://pastebin.com/qzLEjBN0

 regionserver 1 log ~5MB: http://pastebin.com/g3aB5L81
 regionserver 2 log ~5MB: http://pastebin.com/NDEaUbJv
 regionserver 3 log ~5MB: http://pastebin.com/SAVPnr7S

 zookeeper 1,2,3 log: http://pastebin.com/33RPTHKX

 So...

 Am I seeing a deadlock occurring in the regionserver 2 stacktrace?

 IPC Server handler 18 on 60020 - Thread t...@65
   java.lang.Thread.State: WAITING on
 java.util.concurrent.locks.reentrantreadwritelock$nonfairs...@99de7deowned 
 by: IPC Server handler 17 on 60020
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:953)
at
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:846)
at
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:241)
at
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushSomeRegions(MemStoreFlusher.java:352)
- locked
 org.apache.hadoop.hbase.regionserver.memstoreflus...@4c2fe6bf
at
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:321)
- locked
 org.apache.hadoop.hbase.regionserver.memstoreflus...@4c2fe6bf
at
 org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1775)
at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

   Locked ownable synchronizers:
- locked
 java.util.concurrent.locks.reentrantlock$nonfairs...@5cd62cac

- locked
 java.util.concurrent.locks.reentrantlock$nonfairs...@3cf93af4


 IPC Server handler 17 on 60020 - Thread t...@64
   java.lang.Thread.State: BLOCKED on java.util.hash...@1e1b300f owned by:
 regionserver/192.168.200.32:60020.cacheFlusher
at
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.request(MemStoreFlusher.java:172)
at
 org.apache.hadoop.hbase.regionserver.HRegion.requestFlush(HRegion.java:1524)
at
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1509)
at
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1292)
at
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1255)
at
 org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1781)
at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

   Locked ownable synchronizers:
- locked
 java.util.concurrent.locks.reentrantreadwritelock$nonfairs...@99de7de

 regionserver/192.168.200.32:60020.cacheFlusher - Thread t...@18
   java.lang.Thread.State: WAITING on
 java.util.concurrent.locks.reentrantlock$nonfairs...@5cd62cac owned by:
 IPC Server handler 18 on 60020
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
 

RE: Run MR job when my data stays in hbase?

2010-07-19 Thread Stuart Smith
Hello,

  You can ignore this if you're already rock solid on writing M/R jobs, but 
just in case you're as new to this as I am: 

Be careful you have all your dependencies lined up in the jar you're creating 
your M/R job in.  If you're using Eclipse this means selecting Extract 
required libraries into generated jar. 

Without this you get strange Map class not found errors, similar to when you 
forget to make your map class static or forget to call setJarByClass() on your 
job. 

All the examples I saw that used the *new api* were a little more complicated 
than needed. A stripped down example with the new api:

public static class Mapper extends TableMapperText,IntWritable
{
@Override
public void map( ImmutableBytesWritable key, Result value, Context 
context )
throws IOException, InterruptedException
{
//Don't forget to make sure to load this as UTF-8
String sha256 = new String( key.get(), UTF-8 );
//just calling value.value() will NOT give you what you want 
byte[] valueBuffer = value.getValue(Bytes.toBytes(/*family*/), 
Bytes.toBytes(/*qualifier*/));   
/**Do stuff**/
context.write( [some text], [some int] );
}
}

public static class Reduce extends TableReducerText,IntWritable,Text
{
@Override
public void reduce( Text key, IterableIntWritable Values, Context 
context )
throws IOException, InterruptedException
{
/**output of a reduce job needs to be a [something],Put object pair*/
Put outputRow = new Put( Bytes.toBytes(row key) );
outputRow.add( Bytes.toBytes(/*output family*/), 
Bytes.toBytes(/*output qualifier*/), Bytes.toBytes(count) );
context.write( /*some string*/, outputRow );
}
}

public static void main(String[] argv) throws Exception 
{
Job validateJob = new Job( configuration, /*job name*/ );
//don't forget this!
validateJob.setJarByClass(/*main class*/.class);

//don't add anything, and it will scan everything (according to docs)
Scan scan = new Scan();
scan.addColumn( Bytes.toBytes(/*input family*/), Bytes.toBytes(/*input 
qualifier*/) );

TableMapReduceUtil.initTableMapperJob(/*input tablename*/, scan, 
Mapper.class, Text.class, IntWritable.class, validateJob);
TableMapReduceUtil.initTableReducerJob(/*output table name*/, 
Reduce.class, validateJob);

validateJob.waitForCompletion(true);
}

But look at the examples! I just thought some simple highlights might help. 
Don't forget that you can issue Put()'s from your Map() tasks, if you already 
have the data you need assembled (just open a connection in the map 
constructor):

super();
this.hbaseConfiguration = new HBaseConfiguration();
this.hbaseConfiguration.set(hbase.master, ubuntu-namenode:6);
this.fileMetadataTable = new HTable( hbaseConfiguration, /*tableName*/ 
);

and issue the Put() in your map() method. This can take the load of your 
reduce() tasks, which may speed things up a bit.

Caveat emptor:
I just started on all this stuff. ;)

Hope it helps.

Take care,
  -stu



--- On Mon, 7/19/10, Hegner, Travis theg...@trilliumit.com wrote:

 From: Hegner, Travis theg...@trilliumit.com
 Subject: RE: Run MR job when my data stays in hbase?
 To: user@hbase.apache.org user@hbase.apache.org
 Date: Monday, July 19, 2010, 11:55 AM
 Also make sure that the
 $HBASE_HOME/hbase-version.jar,
 $HBASE_HOME/lib/zookeeper-version.jar, and the
 $HBASE_HOME/conf/ are all on the classpath in your
 $HADOOP_HOME/conf/hadoop-env.sh file. That configuration
 must be cluster wide.
 
 With that, your map and reduce tasks can access zookeeper
 and hbase objects. You can then use the TableInputFormat
 with TableOutputFormat, or you can use TableInputFormat, and
 your reduce tasks can write data directly back into Hbase.
 You're problem, and your dataset, will dictate which of
 those methods is more efficient.
 
 Travis Hegner
 http://www.travishegner.com/

 
 -Original Message-
 From: Andrey Stepachev [mailto:oct...@gmail.com]
 Sent: Monday, July 19, 2010 9:28 AM
 To: user@hbase.apache.org
 Subject: Re: Run MR job when my data stays in hbase?
 
 2010/7/19 elton sky eltonsky9...@gmail.com:
 
  My question is if I wanna run the backgroup process as
 a MR job, can I get
  data from hbase, rather than hdfs, with hadoop? How do
 I do that?
  I appreciate if anyone can provide some simple example
 code.
 
 Look at org.apache.hadoop.hbase.mapreduce package in hbase
 sources
 and as real example:
 org.apache.hadoop.hbase.mapreduce.RowCounter
 
 The information contained in this communication is
 confidential and is intended only for the use of the named
 recipient.  Unauthorized use, disclosure, or copying is
 strictly prohibited and may be unlawful.  If you have
 received this communication in error, you should know