from:"Jean\-Daniel Cryans"

Re: how to pre split a table whose row key is MD5(url)?

2014-05-14 Thread Jean-Daniel Cryans

On Tue, May 13, 2014 at 9:58 AM, Liam Slusser lslus...@gmail.com wrote:

 You can also create a table via the hbase shell with pre-split tables like
 this...

 Here is a 32-byte split into 16 different regions, using base16 (ie a md5
 hash) for the key-type.

 create 't1', {NAME = 'f1'},
 {SPLITS= ['1000',
 '2000',
 '3000',
 '4000',
 '5000',
 '6000',
 '7000',
 '8000',
 '9000',
 'a000',
 'b000',
 'c000',
 'd000',
 'e000',
 'f000']}


To make this easier to type, you don't even need the 0 padding. Just '1',
'2', '3', ... 'f' is enough :)



 thanks,
 liam



 On Tue, May 13, 2014 at 6:49 AM, sudhakara st sudhakara...@gmail.com
 wrote:

  you can pre-splite table using you hex characters string for start key,
 end
  key and using  number of regions to spilit
 
 
 
 **
  HTableDescriptor tableDes = new HTableDescriptor(tableName);
  tableDes.setValue(HTableDescriptor.SPLIT_POLICY,
  KeyPrefixRegionSplitPolicy.class.getName());
 
 byte[][] splits =
  getHexSplits(SPLIT_START_KEY,SPLIT_END_KEY,NUM_OF_REGION_SPLIT);
  admin.createTable(tableDes, splits);
 
 
 
 **
   private  byte[][] getHexSplits(String startKey, String endKey, int
  numRegions) {
  byte[][] splits = new byte[numRegions - 1][];
  BigInteger lowestKey = new BigInteger(startKey, 8); //considering
  for first 8bytes to spilte
  BigInteger highestKey = new BigInteger(endKey, 8);
  BigInteger range = highestKey.subtract(lowestKey);
  BigInteger regionIncrement =
  range.divide(BigInteger.valueOf(numRegions));
  lowestKey = lowestKey.add(regionIncrement);
  for (int i = 0; i  numRegions - 1; i++) {
  BigInteger key =
  lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
  byte[] b = String.format(%016x, key).getBytes();
  splits[i] = b;
  }
  return splits;
  }
 
 
 
 *
 
 
  On Mon, May 12, 2014 at 7:07 AM, Li Li fancye...@gmail.com wrote:
 
   thanks. I will try this.
   by the way, byte range is -128 - 127
  
   On Mon, May 12, 2014 at 6:13 AM, Michael Segel
   michael_se...@hotmail.com wrote:
Simple answer… you really can’t.
The best thing you can do is to pre split the table in to 4 regions
   based on splitting the first byte in to 4 equal ranges.
   (0-63,64-127,128-191,191-255)
   
And hope that you’ll have an even split.
   
In theory, over time you will.
   
   
On May 8, 2014, at 1:58 PM, Li Li fancye...@gmail.com wrote:
   
say I have 4 region server. How to pre split a table using MD5 as
 row
   key?
   
   
  
 
 
 
  --
 
  Regards,
  ...sudhakara

Re: Error loading SHA-1 keys with load bulk

2014-05-01 Thread Jean-Daniel Cryans

Are you using HFileOutputFormat.configureIncrementalLoad() to set up the
partitioner and the reducers? That will take care of ordering your keys.

J-D


On Thu, May 1, 2014 at 5:38 AM, Guillermo Ortiz konstt2...@gmail.comwrote:

 I have been looking at the code in HBase, but, I don't really understand
 what this error happens. Why can I put in HBase those keys?


 2014-04-30 17:57 GMT+02:00 Guillermo Ortiz
 konstt2...@gmail.comjavascript:_e(%7B%7D,'cvml','konstt2...@gmail.com
 ');
 :

  I'm using HBase with MapReduce to load a lot of data, so I have decide to
  do it with bulk load.
 
 
  I parse my keys with SHA1, but when I try to load them, I got this
  exception.
 
  java.io.IOException: Added a key not lexically larger than previous
 key=\x00(6e9e59f36a7ec2ac54635b2d353e53e677839046\x01l\x00\x00\x01E\xB3\xC9\xC7\x0E,
 lastkey=\x00(b313a9f1f57c8a07c81dc3221c6151cf3637506a\x01l\x00\x00\x01E\xAE\x18k\x87\x0E
at
 org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:207)
at
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:324)
at
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:289)
at
 org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1206)
at
 org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:168)
at
 org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:124)
at
 org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
at
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
 
  I work with HBase 0.94.6. I have been loking for if I could define any
 reducer, since, I have defined no one. I have read something about
 KeyValueSortReducer but, I don'tknow if there's something that extends
 TableReducer or I'm lookging for a wrong way.

Re: Re: replication verifyrep

2014-04-15 Thread Jean-Daniel Cryans

On Tue, Apr 15, 2014 at 12:17 AM, Hansi Klose hansi.kl...@web.de wrote:

 Hi Jean-Daniel,

 thank you for your answer and bring some light into the darkness.


You're welcome!


  You can see the bad rows listed in the user logs for your MR job.

 What log do you mean. The output from the command line?
 I only see the count of GOOD or BAD rows.
 Are the bad rows listed in that log which are not replicated?


You started VerifyReplication via hadoop jar, so it's a MapReduce job. Go
to your JobTracker's web UI, you should see your jobs there, then checkout
one of them and click on one of the completed maps then look for the log.
The bad rows are listed in that output.

J-D

Re: replication verifyrep

2014-04-14 Thread Jean-Daniel Cryans

Yeah you should use endtime, it was fixed as part of
https://issues.apache.org/jira/browse/HBASE-10395.

You can see the bad rows listed in the user logs for your MR job.

J-D

On Mon, Apr 14, 2014 at 3:06 AM, Hansi Klose hansi.kl...@web.de wrote:

Hi,

I wrote a little script which should control the running replication.

The script is triggered by cron and executes the following command with
the actual time stamp in endtime and
a time stamp = endtime - 1080 milli seconds. So the time frame is 3
hours.

hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime=1397217601927
--endtime=1397228401927 --families=t 1 tablename 21

After some running's the script found some BADROWS.

14/04/11 17:04:05 INFO mapred.JobClient: BADROWS=176
14/04/11 17:04:05 INFO mapred.JobClient: GOODROWS=2

I executed the same command 20 Minutes later in the shell and got :

hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime=1397217601927
--endtime=1397228401927 --families=t 1 tablename 21
14/04/11 17:21:03 INFO mapred.JobClient: BADROWS=178

After that I run the command with the same start time and the actual
timestamp an end time, so the time frame is greater
but with the same start time. And now I got :

hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime=1397217601927
--endtime=1397230074876 --families=t 1 tablename 21
14/04/11 17:28:28 INFO mapred.JobClient: GOODROWS=184

Is there something wrong with the command?
In our metrics i could not see that three is an Issue at that time.

We are a little bit confused about the endtime. In all documents they talk
about stoptime.
But we found that in the job configuration there is no parameter called
stoptime.
We found the verifyrep.startTime which hold the value of the starttime
in our command and
verifyrep.endTime which is alway 0 when we use stoptime in the command.
So we decided to use endtime

Even in the code
http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.html
they use: static long endTime = Long.MAX_VALUE;

Which name is the right on? endtime or stoptime?

We use cdh 4.2.0.

Regards Hansi

Re: How to decide the next HMaster?

2014-04-08 Thread Jean-Daniel Cryans

It's a simple leader election via ZooKeeper.

J-D


On Tue, Apr 8, 2014 at 7:18 AM, gortiz gor...@pragsis.com wrote:

 Could someone explain me which it's the process to select the next HMaster
 when the current one is gone down?? I've been looking for information about
 it in the documentation, but, I haven't found anything.

Re: block cache size questions

2014-03-17 Thread Jean-Daniel Cryans

On Mon, Mar 17, 2014 at 6:01 AM, Linlin Du linlindu2...@hotmail.com wrote:

 Hi all,

 First question:
 According to documentation, hfile.block.cache.size is by default 40
 percentage of maximum heap (-Xmx setting). If -Xmx is not used and only
 -Xms is used, what will it be in this case?

 Second question:
 This 40% heap space is shared by all entities (stores). How much block
 cache is used by each store (entity)? Is it allocated on demand? If a
 region has never been read after the region server is up, will block cache
 be allocated for it?


The block cache stores blocks, which are chunks of an HFile, and are by
default around 64KB in size. Allocation is on-demand. Learn more here
http://hbase.apache.org/book.html#block.cache



 Third question:
 Is it possible to tell from .META when a region was last read? If so, how?

 Many thanks,

 Linlin

Re: latest stable hbase-0.94.13 cannot start master: java.lang.RuntimeException: Failed suppression of fs shutdown hook

2014-03-11 Thread Jean-Daniel Cryans

Resurrecting this old thread. The following error:

java.lang.RuntimeException: Failed suppression of fs shutdown hook

Is caused when HBase is compiled against Hadoop 1 and has Hadoop 2 jars on
its classpath. Someone on IRC just had the same issue and I was able to
repro after seeing the classpath.

J-D


On Wed, Nov 13, 2013 at 7:00 AM, Ted Yu yuzhih...@gmail.com wrote:

 Your hbase.rootdir config parameter points to file: instead of hdfs:

 Where is hadoop-2.2.0 running ?

 You also need to build tar ball using hadoop 2 profile. See the following
 in pom.xml:

   profile for building against Hadoop 2.0.0-alpha. Activate using:
mvn -Dhadoop.profile=2.0
 --
 profile
   idhadoop-2.0/id


 On Wed, Nov 13, 2013 at 6:13 AM, jason_vasd...@mcafee.com wrote:

  Good day -
 
  I'm an hadoop  hbase newbie, so please excuse me if this is a known
 issue
  - hoping someone might send me a simple fix !
 
  I installed the latest stable tarball : hbase-0.94.13.tar.gz , and
  followed the instructions at
  docs/book/quickstart.html .
  (After installing hadoop-2.2.0, and running the resourcemanager 
  nodemanager, which are both running and presenting
  web-pages at the configured ports OK).
 
  My hbase-site.xml now looks like:
 
  configuration
 
property
  namehbase.rootdir/name
  valuefile:///home/jason/3P/hbase/data/value
/property
 
property
  namehbase.zookeeper.property.dataDir/name
  value/home/jason/3P/hbase/zookeeper-data/value
/property
 
  /configuration
 
  I try to start hbase as instructed in the QuickStart guide:
 $ bin/hbase-start.sh
 starting master, logging to
  /home/jason/3P/hbase-0.94.13/logs/hbase-jason-master-jvds.out
 
  But the master does NOT start .
  I think it is a bug that the hbase-start.sh script does not complain that
  hbase failed to start.
  Shall I raise a JIRA issue on this ?
 
  Anyway, when I look in the logs/hbase-jason-master-jvds.log file, I see
  that a Java exception occurred :
 
  2013-11-13 13:52:06,316 INFO
  org.apache.hadoop.hbase.master.ActiveMasterManager: Deleting ZNode for
  /hbase/backup-masters/jvds,52926,1384350725521 from backup master
 directory
  2013-11-13 13:52:06,318 INFO
  org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
  KeeperException when processing sessionid:0x14251bbb3d4 type:delete
  cxid:0x13 zxid:0xb txntype:-1 reqpath:n/a Error
  Path:/hbase/backup-masters/jvds,52926,1384350725521
 Error:KeeperErrorCode =
  NoNode for /hbase/backup-masters/jvds,52926,1384350725521
  2013-11-13 13:52:06,320 WARN
  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
  /hbase/backup-masters/jvds,52926,1384350725521 already deleted, and this
 is
  not a retry
  2013-11-13 13:52:06,320 INFO
  org.apache.hadoop.hbase.master.ActiveMasterManager:
  Master=jvds,52926,1384350725521
  2013-11-13 13:52:06,348 INFO
  org.apache.hadoop.hbase.master.SplitLogManager: timeout = 30
  2013-11-13 13:52:06,348 INFO
  org.apache.hadoop.hbase.master.SplitLogManager: unassigned timeout =
 18
  2013-11-13 13:52:06,348 INFO
  org.apache.hadoop.hbase.master.SplitLogManager: resubmit threshold = 3
  2013-11-13 13:52:06,352 INFO
  org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks and
 0
  rescan nodes
  2013-11-13 13:52:06,385 INFO org.apache.hadoop.util.NativeCodeLoader:
  Loaded the native-hadoop library
  2013-11-13 13:52:06,385 ERROR
  org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
  java.lang.RuntimeException: Failed suppression of fs shutdown hook:
  Thread[Thread-27,5,main]
  at
 
 org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:196)
  at
 
 org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:83)
  at
 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
  at
 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
  at
 
 org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:149)
  at
 
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at
 
 org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
  at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2120)
  2013-11-13 13:52:06,386 ERROR org.apache.hadoop.io.nativeio.NativeIO:
  Unable to initialize NativeIO libraries
  java.lang.NoSuchFieldError: workaroundNonThreadSafePasswdCalls
  at org.apache.hadoop.io.nativeio.NativeIO.initNative(Native
 Method)
  at
  org.apache.hadoop.io.nativeio.NativeIO.clinit(NativeIO.java:58)
  at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653)
  at
 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
  at

Re: Who creates hbase root.dir ?

2014-02-07 Thread Jean-Daniel Cryans

IIRC it used to be an issue if the folder was already existing, even if
empty. It's not the case anymore.

J-D


On Fri, Feb 7, 2014 at 3:38 PM, Jay Vyas jayunit...@gmail.com wrote:

 Hi hbase.

 In normal installations, Im wondering who should create hbase root.dir.

 1) I have seen pseudo-distributed mode docs implying that Hbase is smart
 enough to do it by itself.

 Let HBase create the hbase.rootdir directory. If you don't, you'll get
 warning saying HBase needs a migration run because the directory is missing
 files expected by HBase (it'll create them if you let it).

 2) But in bigtop, I see mkdir in the init-hdfs.sh :

 su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir /hbase'
 su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown hbase:hbase /hbase'

 So whats the right way to maintain hbase-root ?



 --
 Jay Vyas
 http://jayunit100.blogspot.com

Re: MultiMaster HBase: --backup really needed ?

2013-12-09 Thread Jean-Daniel Cryans

The problem with having a bunch of master racing is that it's not evident
for the operator who won, so specifying --backup to all but one master
ensures that you always easily know where the master is.

Relevant code from HMaster.java:

// If we're a backup master, stall until a primary to writes his address
if (!c.getBoolean(HConstants.MASTER_TYPE_BACKUP,
  HConstants.DEFAULT_MASTER_TYPE_BACKUP)) {
  return;
}

J-D


On Mon, Dec 9, 2013 at 9:37 AM, Bryan Beaudreault
bbeaudrea...@hubspot.comwrote:

 I've run HBase from version 0.90.2 to our current 0.94.6 (CDH 4.3) and have
 never specified a --backup option on any of my commands with regard to the
 master.  You're correct that they race to be active, and failover is
 completely automatic in the case of one master going down.

 TBH I've never even heard of a --backup argument, so I'm wondering if it is
 something extremely old or extremely new :)


 On Mon, Dec 9, 2013 at 6:24 AM, Manuel de Ferran
 manuel.defer...@gmail.comwrote:

  Greetings,
 
  I'm playing without MultiMaster, and I was wondering if --backup is
 really
  needed.
 
  As far as I have observed, masters race to be the active one. Is there
 any
  drawback in not mentioning --backup on additional nodes ?
 
 
  Regards,
 
  --
  Manuel DE FERRAN

Re: Question about the HBase thrift server

2013-12-09 Thread Jean-Daniel Cryans

That's right, round robin should only be applied when you start answering
some client request and stick to it until you're done.

J-D


On Fri, Dec 6, 2013 at 9:17 PM, Varun Sharma va...@pinterest.com wrote:

 Hi everyone,

 I have a question about the hbase thrift server and running scans in
 particular. The thrift server maintains a map of int - ResultScanner(s).
 These integers are passed back to the client. Now in a typical setting
 people run many thrift servers and round robin rpc(s) to them.

 It seems that for scans, such a technique of just round robinning is simply
 not going to work. If a scan integer ID has been retrieved from a certain
 thrift server A, all the next() calls and close calls should fall on that
 server.

 I just wanted to make sure I got this thinking right and there isn't really
 a way around this because scans, unlike gets have associated state.

 Thanks !
 Varun

Re: You Are Dead Exception due to promotion failure

2013-11-01 Thread Jean-Daniel Cryans

It reads that it spent 89 seconds doing a CMS concurrent mark, but really
just spent 14 seconds of user CPU and 4 seconds of system CPU doing it.
Where are the other 70 seconds? It's often just swapping, and less likely
it can also be CPU starvation.

J-D


On Fri, Nov 1, 2013 at 1:40 AM, Asaf Mesika asaf.mes...@gmail.com wrote:

 Can you please explain why is this suspicious?

 On Monday, October 7, 2013, Jean-Daniel Cryans wrote:

  This line:
 
  [CMS-concurrent-mark: 12.929/88.767 secs] [Times: user=14.30 sys=3.74,
  real=88.77
  secs]
 
  Is suspicious. Are you swapping?
 
  J-D
 
 
  On Mon, Oct 7, 2013 at 8:34 AM, prakash kadel prakash.ka...@gmail.com
 javascript:;
  wrote:
 
   Also,
  why is the CMS not kicking in early, i have set XX:+
   UseCMSInitiatingOccupancyOnly???
  
   Sincerely,
   Prakash
  
  
   On Tue, Oct 8, 2013 at 12:32 AM, prakash kadel 
 prakash.ka...@gmail.com
   wrote:
  
Hello,
  I am getting this YADE all the time
   
HBASE_HEAPSIZE=8000
   
Settings: -ea -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=200
-XX:+HeapDumpOnOutOfMemoryError -XX:+CMSIncrementalMode
  -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=50
  -XX:+UseCMSInitiatingOccupancyOnly
-XX:NewSize=256m -XX:MaxNewSize=256m
   
it seems there is promotion failure and the CMS take too long
   
2013-10-07T01:22:55.784+0900: [GC [ParNew: 235968K-26176K(235968K),
0.3219980 secs] 7709485K-7538063K(8165824K) icms_dc=0 , 0.3221100
  secs]
[Times: user=0.27 sys=0.01, real=0.33 secs]
2013-10-07T01:23:07.361+0900: [GC [ParNew: 235842K-26176K(235968K),
0.1899680 secs] 7747729K-7578713K(8165824K) icms_dc=0 , 0.1900700
  secs]
[Times: user=0.26 sys=0.02, real=0.19 secs]
2013-10-07T01:23:20.154+0900: [GC [ParNew: 235803K-26176K(235968K),
0.2428200 secs] 7788341K-7615284K(8165824K) icms_dc=0 , 0.2429570
  secs]
[Times: user=0.25 sys=0.02, real=0.24 secs]
2013-10-07T01:23:34.594+0900: [GC [ParNew: 235889K-26176K(235968K),
0.2440980 secs] 7824998K-7651179K(8165824K) icms_dc=0 , 0.2442130
  secs]
[Times: user=0.20 sys=0.03, real=0.25 secs]
2013-10-07T01:23:47.666+0900: [GC [ParNew: 235906K-26176K(235968K),
0.2998100 secs] 7860909K-7686832K(8165824K) icms_dc=3 , 0.3020280
  secs]
[Times: user=0.23 sys=0.04, real=0.30 secs]
2013-10-07T01:23:57.216+0900: [GC [1 CMS-initial-mark:
   7660656K(7929856K)]
7788778K(8165824K), 3.7665320 secs] [Times: user=0.07 sys=0.06,
  real=3.77
secs]
2013-10-07T01:24:05.508+0900: [GC [ParNew: 235811K-26176K(235968K),
0.4632860 secs] 7896468K-7721167K(8165824K) icms_dc=3 , 0.4634100
  secs]
[Times: user=0.21 sys=0.03, real=0.46 secs]
2013-10-07T01:24:19.889+0900: [GC [ParNew: 235812K-26176K(235968K),
0.3531980 secs] 7930804K-7755633K(8165824K) icms_dc=3 , 0.3533230
  secs]
[Times: user=0.24 sys=0.06, real=0.35 secs]
2013-10-07T01:24:32.832+0900: [GC [ParNew: 235968K-26176K(235968K),
0.6298370 secs] 7965425K-7790643K(8165824K) icms_dc=3 , 0.6299530
  secs]
[Times: user=0.23 sys=0.03, real=0.63 secs]
2013-10-07T01:24:43.629+0900: [GC [ParNew: 235800K-26176K(235968K),
0.3190580 secs] 8000268K-782K(8165824K) icms_dc=3 , 0.3191840
  secs]
[Times: user=0.24 sys=0.02, real=0.32 secs]
2013-10-07T01:24:56.005+0900: [GC [ParNew: 235848K-26176K(235968K),
0.4839400 secs] 8035228K-7860300K(8165824K) icms_dc=3 , 0.4840480
  secs]
[Times: user=0.31 sys=0.03, real=0.49 secs]
2013-10-07T01:25:07.282+0900: [GC [ParNew: 235750K-26176K(235968K),
0.3423250 secs] 8069875K-7895852K(8165824K) icms_dc=9 , 0.3424380
  secs]
[Times: user=0.21 sys=0.06, real=0.34 secs]
2013-10-07T01:25:19.853+0900: [GC [ParNew (promotion failed):
235745K-235745K(235968K), 0.3339710
   secs][CMS2013-10-07T01:25:29.750+0900:
[CMS-concurrent-mark: 12.929/88.767 secs] [Times: user=14.30
 sys=3.74,
real=88.77 secs]
 (concurrent mode failure): 7899125K-2882954K(7929856K), 42.8279810
   secs]
8105422K-2882954K(8165824K), [CMS Perm : 31956K-31861K(53340K)]
   icms_dc=9
, 43.1621090 secs] [Times: user=10.40 sys=1.89, real=43.16 secs]
2013-10-07T01:26:08.288+0900: [GC [1 CMS-initial-mark:
   2882954K(7929856K)]
2978434K(8165824K), 0.0965830 secs] [Times: user=0.04 sys=0.00,
  real=0.09
secs]
Heap
 par new generation   total 235968K, used 197697K
 [0x000606e0,
0x000616e0, 0x000616e0)
  eden space 209792K,  94% used [0x000606e0,
  0x000612f10718,
0x000613ae)
  from space 26176K,   0% used [0x00061547,
 0x00061547,

Re: HBase ShutdownHook problem

2013-10-31 Thread Jean-Daniel Cryans

(LocalHBaseCluster.java:420)
 at

 org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:149)
  at

 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at

 org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
 at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2100)
  at HBase.HMasterThread.run(HMasterThread.java:19)

 Salih Kardan


 On Fri, Oct 25, 2013 at 9:00 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:

  What's happening before this stack trace in the log?
 
  J-D
 
 
  On Fri, Oct 25, 2013 at 6:10 AM, Salih Kardan karda...@gmail.com
 wrote:
 
   Hi all
  
   I am getting the error below while starting hbase (hbase 0.94.11). I
  guess
   since hbase cannot
   connect to hadoop, I get this error.
  
   *java.lang.RuntimeException: Failed suppression of fs shutdown hook:
   Thread[Thread-8,5,main] at*
   *
  
 
 org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:196)
   at *
  
  
 
 *org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:83)
   at *
  
  
 
 *org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
   at *
  
  
 
 *org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at *
  
  
 
 *org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:149)
   at*
   *
  
 
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
   at *
   *org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at *
  
  
 
 *org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
   at*
   * org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2100)*
  
   my /etc/hosts file only contains (127.0.0.1 - machine name).

Re: HBase ShutdownHook problem

2013-10-25 Thread Jean-Daniel Cryans

What's happening before this stack trace in the log?

J-D


On Fri, Oct 25, 2013 at 6:10 AM, Salih Kardan karda...@gmail.com wrote:

 Hi all

 I am getting the error below while starting hbase (hbase 0.94.11). I guess
 since hbase cannot
 connect to hadoop, I get this error.

 *java.lang.RuntimeException: Failed suppression of fs shutdown hook:
 Thread[Thread-8,5,main] at*
 *
 org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:196)
 at *

 *org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:83)
 at *

 *org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
 at *

 *org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
 at *

 *org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:149)
 at*
 *
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
 at *
 *org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at *

 *org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
 at*
 * org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2100)*

 my /etc/hosts file only contains (127.0.0.1 - machine name).

Re: HBase Random Read latency 100ms

2013-10-09 Thread Jean-Daniel Cryans

On Wed, Oct 9, 2013 at 10:59 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:

I can't say for SCR. There is a possibility that the feature is broken, of
course.
But the fact that hbase.regionserver.checksum.verify does not affect
performance means that OS caches
effectively HDFS checksum files.

See OS cache + SCR VS HBase CRC over OS cache+SCR in this document I
shared some time ago:
https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUEoutput=html

In an all-in-memory test it shows a pretty big difference.

J-D

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com

From: Ramu M S [ramu.ma...@gmail.com]
Sent: Wednesday, October 09, 2013 12:11 AM
To: user@hbase.apache.org; lars hofhansl
Subject: Re: HBase Random Read latency 100ms

Hi All,

Sorry. There was some mistake in the tests (Clients were not reduced,
forgot to change the parameter before running tests).

With 8 Clients and,

SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2

Still, SCR disabled gives better results, which confuse me. Can anyone
clarify?

Also, I tried setting the parameter (hbase.regionserver.checksum.verify as
true) Lars suggested with SCR disabled.
Average Latency is around 9.8 ms, a fraction lesser.

Thanks
Ramu

On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S ramu.ma...@gmail.com wrote:

Hi All,

I just ran only 8 parallel clients,

With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2

I always thought SCR enabled, allows a client co-located with the
DataNode
to read HDFS file blocks directly. This gives a performance boost to
distributed clients that are aware of locality.

Is my understanding wrong OR it doesn't apply to my scenario?

Meanwhile I will try setting the parameter suggested by Lars and post you
the results.

Thanks,
Ramu

On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl la...@apache.org wrote:

Good call.
Could try to enable hbase.regionserver.checksum.verify, which will cause
HBase to do its own checksums rather than relying on HDFS (and which
saves
1 IO per block get).

I do think you can expect the index blocks to be cached at all times.

-- Lars

From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org user@hbase.apache.org
Sent: Tuesday, October 8, 2013 8:44 PM
Subject: RE: HBase Random Read latency 100ms

Upd.

Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
(data + .crc) in a worst case. I think if Bloom Filter is enabled than
it is going to be 6 File IO in a worst case (large data set), therefore
you will have not 5 IO requests in queue but up to 20-30 IO requests in
a
queue
This definitely explains 100ms avg latency.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com

From: Vladimir Rodionov
Sent: Tuesday, October 08, 2013 7:24 PM
To: user@hbase.apache.org
Subject: RE: HBase Random Read latency 100ms

Ramu,

You have 8 server boxes and 10 client. You have 40 requests in parallel
-
5 per RS/DN?

You have 5 requests on random reads in a IO queue of your single RAID1.
With avg read latency of 10 ms, 5 requests in queue will give us 30ms.
Add
some overhead
of HDFS + HBase and you will probably have your issue explained ?

Your bottleneck is your disk system, I think. When you serve most of
requests from disks as in your large data set scenario, make sure you
have
adequate disk sub-system and
that it is configured properly. Block Cache and OS page can not help you
in this case as working data set is larger than both caches.

Good performance numbers in small data set scenario are explained by the
fact that data fits into OS page cache and Block Cache - you do not read
data from disk even if
you disable block cache.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com

From: Ramu M S [ramu.ma...@gmail.com]
Sent: Tuesday, October 08, 2013 6:00 PM
To: user@hbase.apache.org
Subject: Re: HBase Random Read latency 100ms

Hi All,

After few suggestions from the mails earlier I changed the following,

1. Heap Size to 16 GB
2. Block Size to 16KB
3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
4. Data Locality Index is 100 in all RS

I have clients running in 10 machines, each with 4 threads. So total 40.
This is same in all tests.

Result:
1. Average latency is still 100ms.

Re: You Are Dead Exception due to promotion failure

2013-10-07 Thread Jean-Daniel Cryans

This line:

[CMS-concurrent-mark: 12.929/88.767 secs] [Times: user=14.30 sys=3.74,
real=88.77
secs]

Is suspicious. Are you swapping?

J-D


On Mon, Oct 7, 2013 at 8:34 AM, prakash kadel prakash.ka...@gmail.comwrote:

 Also,
why is the CMS not kicking in early, i have set XX:+
 UseCMSInitiatingOccupancyOnly???

 Sincerely,
 Prakash


 On Tue, Oct 8, 2013 at 12:32 AM, prakash kadel prakash.ka...@gmail.com
 wrote:

  Hello,
I am getting this YADE all the time
 
  HBASE_HEAPSIZE=8000
 
  Settings: -ea -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=200
  -XX:+HeapDumpOnOutOfMemoryError -XX:+CMSIncrementalMode -XX:+UseParNewGC
  -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly
  -XX:NewSize=256m -XX:MaxNewSize=256m
 
  it seems there is promotion failure and the CMS take too long
 
  2013-10-07T01:22:55.784+0900: [GC [ParNew: 235968K-26176K(235968K),
  0.3219980 secs] 7709485K-7538063K(8165824K) icms_dc=0 , 0.3221100 secs]
  [Times: user=0.27 sys=0.01, real=0.33 secs]
  2013-10-07T01:23:07.361+0900: [GC [ParNew: 235842K-26176K(235968K),
  0.1899680 secs] 7747729K-7578713K(8165824K) icms_dc=0 , 0.1900700 secs]
  [Times: user=0.26 sys=0.02, real=0.19 secs]
  2013-10-07T01:23:20.154+0900: [GC [ParNew: 235803K-26176K(235968K),
  0.2428200 secs] 7788341K-7615284K(8165824K) icms_dc=0 , 0.2429570 secs]
  [Times: user=0.25 sys=0.02, real=0.24 secs]
  2013-10-07T01:23:34.594+0900: [GC [ParNew: 235889K-26176K(235968K),
  0.2440980 secs] 7824998K-7651179K(8165824K) icms_dc=0 , 0.2442130 secs]
  [Times: user=0.20 sys=0.03, real=0.25 secs]
  2013-10-07T01:23:47.666+0900: [GC [ParNew: 235906K-26176K(235968K),
  0.2998100 secs] 7860909K-7686832K(8165824K) icms_dc=3 , 0.3020280 secs]
  [Times: user=0.23 sys=0.04, real=0.30 secs]
  2013-10-07T01:23:57.216+0900: [GC [1 CMS-initial-mark:
 7660656K(7929856K)]
  7788778K(8165824K), 3.7665320 secs] [Times: user=0.07 sys=0.06, real=3.77
  secs]
  2013-10-07T01:24:05.508+0900: [GC [ParNew: 235811K-26176K(235968K),
  0.4632860 secs] 7896468K-7721167K(8165824K) icms_dc=3 , 0.4634100 secs]
  [Times: user=0.21 sys=0.03, real=0.46 secs]
  2013-10-07T01:24:19.889+0900: [GC [ParNew: 235812K-26176K(235968K),
  0.3531980 secs] 7930804K-7755633K(8165824K) icms_dc=3 , 0.3533230 secs]
  [Times: user=0.24 sys=0.06, real=0.35 secs]
  2013-10-07T01:24:32.832+0900: [GC [ParNew: 235968K-26176K(235968K),
  0.6298370 secs] 7965425K-7790643K(8165824K) icms_dc=3 , 0.6299530 secs]
  [Times: user=0.23 sys=0.03, real=0.63 secs]
  2013-10-07T01:24:43.629+0900: [GC [ParNew: 235800K-26176K(235968K),
  0.3190580 secs] 8000268K-782K(8165824K) icms_dc=3 , 0.3191840 secs]
  [Times: user=0.24 sys=0.02, real=0.32 secs]
  2013-10-07T01:24:56.005+0900: [GC [ParNew: 235848K-26176K(235968K),
  0.4839400 secs] 8035228K-7860300K(8165824K) icms_dc=3 , 0.4840480 secs]
  [Times: user=0.31 sys=0.03, real=0.49 secs]
  2013-10-07T01:25:07.282+0900: [GC [ParNew: 235750K-26176K(235968K),
  0.3423250 secs] 8069875K-7895852K(8165824K) icms_dc=9 , 0.3424380 secs]
  [Times: user=0.21 sys=0.06, real=0.34 secs]
  2013-10-07T01:25:19.853+0900: [GC [ParNew (promotion failed):
  235745K-235745K(235968K), 0.3339710
 secs][CMS2013-10-07T01:25:29.750+0900:
  [CMS-concurrent-mark: 12.929/88.767 secs] [Times: user=14.30 sys=3.74,
  real=88.77 secs]
   (concurrent mode failure): 7899125K-2882954K(7929856K), 42.8279810
 secs]
  8105422K-2882954K(8165824K), [CMS Perm : 31956K-31861K(53340K)]
 icms_dc=9
  , 43.1621090 secs] [Times: user=10.40 sys=1.89, real=43.16 secs]
  2013-10-07T01:26:08.288+0900: [GC [1 CMS-initial-mark:
 2882954K(7929856K)]
  2978434K(8165824K), 0.0965830 secs] [Times: user=0.04 sys=0.00, real=0.09
  secs]
  Heap
   par new generation   total 235968K, used 197697K [0x000606e0,
  0x000616e0, 0x000616e0)
eden space 209792K,  94% used [0x000606e0, 0x000612f10718,
  0x000613ae)
from space 26176K,   0% used [0x00061547, 0x00061547,
  0x000616e0)
to   space 26176K,   0% used [0x000613ae, 0x000613ae,
  0x00061547)
   concurrent mark-sweep generation total 7929856K, used 2882954K
  [0x000616e0, 0x0007fae0, 0x0007fae0)
   concurrent-mark-sweep perm gen total 53340K, used 31960K
  [0x0007fae0, 0x0007fe217000, 0x0008)
 
  What is wrong here? please give me some suggestions.
 
 
  Sincerely,
  Prakash

Re: Upcoming HBase bay area user and dev meetups

2013-10-07 Thread Jean-Daniel Cryans

While we're on the topic of upcoming meetups, there's also a meetup at
Facebook's NYC office the week of Strata/Hadoop World (10/28). There's
still room for about 50 attendees.

http://www.meetup.com/HBase-NYC/events/135434632/

J-D


On Mon, Oct 7, 2013 at 2:10 PM, Enis Söztutar e...@apache.org wrote:

 Hi guys,

 I just wanted to give a heads up on upcoming bay area user and dev meetups
 which will happen on the same day, October 24th. ( special thanks Stack for
 pushing this.)

 The user meetup will start at 6:30, and the talks scheduled so far are:

 + Steven Noels will talk about using the Lily Indexer to search your HBase
 content: http://ngdata.github.io/hbase-indexer/
 + St.Ack will talk about what is in hbase-0.96.0
 + Enis will talk about Mapreduce over HBase snapshots (HBASE-8369)

 There will be food and beers as usual. The event page is at
 http://www.meetup.com/hbaseusergroup/events/140759692/. Please write me or
 Stack off-list if you want to give a talk. There is still room for one more
 talk.


 The dev meetup will start at 4pm. Some of the suggested topics include:
 + When is 0.98.0?
 + When is 1.0?  What makes for an HBase 1.0.
 + Assignment Manager
 + What next on MTTR?

 The event page is at: http://www.meetup.com/hackathon/events/144366512/.
 Feel free to suggest / bring up topics that you think is important for
 post-0.96.

 Enis

Re: You Are Dead Exception due to promotion failure

2013-10-07 Thread Jean-Daniel Cryans

Swapping and Java simply don't go well together. You need to ensure that
the committed memory is smaller than the available memory. Also see
http://hbase.apache.org/book.html#perf.os.swap

I haven't looked closely at your GC output but even if CMS was kicking as
early as it's supposed to, the fact that you are swapping might just screw
up everything.

J-D


On Mon, Oct 7, 2013 at 3:13 PM, prakash kadel prakash.ka...@gmail.comwrote:

 BTW,
   if i disable the swap at all. What will happen in the above situation?
 currently it starts swapping at 90%

 Sincerely


 On Tue, Oct 8, 2013 at 7:09 AM, prakash kadel prakash.ka...@gmail.com
 wrote:

  thanks,
 
  yup, it seems so. I have 48 gb memory. i see it swaps at that point.
 
  btw, why is the CMS not kicking in early? do you have any idea?
 
  sincerely
 
 
 
  On Tue, Oct 8, 2013 at 3:00 AM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  This line:
 
  [CMS-concurrent-mark: 12.929/88.767 secs] [Times: user=14.30 sys=3.74,
  real=88.77
  secs]
 
  Is suspicious. Are you swapping?
 
  J-D
 
 
  On Mon, Oct 7, 2013 at 8:34 AM, prakash kadel prakash.ka...@gmail.com
  wrote:
 
   Also,
  why is the CMS not kicking in early, i have set XX:+
   UseCMSInitiatingOccupancyOnly???
  
   Sincerely,
   Prakash
  
  
   On Tue, Oct 8, 2013 at 12:32 AM, prakash kadel 
 prakash.ka...@gmail.com
   wrote:
  
Hello,
  I am getting this YADE all the time
   
HBASE_HEAPSIZE=8000
   
Settings: -ea -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=200
-XX:+HeapDumpOnOutOfMemoryError -XX:+CMSIncrementalMode
  -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=50
  -XX:+UseCMSInitiatingOccupancyOnly
-XX:NewSize=256m -XX:MaxNewSize=256m
   
it seems there is promotion failure and the CMS take too long
   
2013-10-07T01:22:55.784+0900: [GC [ParNew: 235968K-26176K(235968K),
0.3219980 secs] 7709485K-7538063K(8165824K) icms_dc=0 , 0.3221100
  secs]
[Times: user=0.27 sys=0.01, real=0.33 secs]
2013-10-07T01:23:07.361+0900: [GC [ParNew: 235842K-26176K(235968K),
0.1899680 secs] 7747729K-7578713K(8165824K) icms_dc=0 , 0.1900700
  secs]
[Times: user=0.26 sys=0.02, real=0.19 secs]
2013-10-07T01:23:20.154+0900: [GC [ParNew: 235803K-26176K(235968K),
0.2428200 secs] 7788341K-7615284K(8165824K) icms_dc=0 , 0.2429570
  secs]
[Times: user=0.25 sys=0.02, real=0.24 secs]
2013-10-07T01:23:34.594+0900: [GC [ParNew: 235889K-26176K(235968K),
0.2440980 secs] 7824998K-7651179K(8165824K) icms_dc=0 , 0.2442130
  secs]
[Times: user=0.20 sys=0.03, real=0.25 secs]
2013-10-07T01:23:47.666+0900: [GC [ParNew: 235906K-26176K(235968K),
0.2998100 secs] 7860909K-7686832K(8165824K) icms_dc=3 , 0.3020280
  secs]
[Times: user=0.23 sys=0.04, real=0.30 secs]
2013-10-07T01:23:57.216+0900: [GC [1 CMS-initial-mark:
   7660656K(7929856K)]
7788778K(8165824K), 3.7665320 secs] [Times: user=0.07 sys=0.06,
  real=3.77
secs]
2013-10-07T01:24:05.508+0900: [GC [ParNew: 235811K-26176K(235968K),
0.4632860 secs] 7896468K-7721167K(8165824K) icms_dc=3 , 0.4634100
  secs]
[Times: user=0.21 sys=0.03, real=0.46 secs]
2013-10-07T01:24:19.889+0900: [GC [ParNew: 235812K-26176K(235968K),
0.3531980 secs] 7930804K-7755633K(8165824K) icms_dc=3 , 0.3533230
  secs]
[Times: user=0.24 sys=0.06, real=0.35 secs]
2013-10-07T01:24:32.832+0900: [GC [ParNew: 235968K-26176K(235968K),
0.6298370 secs] 7965425K-7790643K(8165824K) icms_dc=3 , 0.6299530
  secs]
[Times: user=0.23 sys=0.03, real=0.63 secs]
2013-10-07T01:24:43.629+0900: [GC [ParNew: 235800K-26176K(235968K),
0.3190580 secs] 8000268K-782K(8165824K) icms_dc=3 , 0.3191840
  secs]
[Times: user=0.24 sys=0.02, real=0.32 secs]
2013-10-07T01:24:56.005+0900: [GC [ParNew: 235848K-26176K(235968K),
0.4839400 secs] 8035228K-7860300K(8165824K) icms_dc=3 , 0.4840480
  secs]
[Times: user=0.31 sys=0.03, real=0.49 secs]
2013-10-07T01:25:07.282+0900: [GC [ParNew: 235750K-26176K(235968K),
0.3423250 secs] 8069875K-7895852K(8165824K) icms_dc=9 , 0.3424380
  secs]
[Times: user=0.21 sys=0.06, real=0.34 secs]
2013-10-07T01:25:19.853+0900: [GC [ParNew (promotion failed):
235745K-235745K(235968K), 0.3339710
   secs][CMS2013-10-07T01:25:29.750+0900:
[CMS-concurrent-mark: 12.929/88.767 secs] [Times: user=14.30
 sys=3.74,
real=88.77 secs]
 (concurrent mode failure): 7899125K-2882954K(7929856K), 42.8279810
   secs]
8105422K-2882954K(8165824K), [CMS Perm : 31956K-31861K(53340K)]
   icms_dc=9
, 43.1621090 secs] [Times: user=10.40 sys=1.89, real=43.16 secs]
2013-10-07T01:26:08.288+0900: [GC [1 CMS-initial-mark:
   2882954K(7929856K)]
2978434K(8165824K), 0.0965830 secs] [Times: user=0.04 sys=0.00,
  real=0.09
secs]
Heap
 par new generation   total 235968K, used 197697K
 [0x000606e0,
0x000616e0, 0x000616e0)
  eden space 209792K,  94% used [0x000606e0

Re: hbase.master parameter?

2013-10-04 Thread Jean-Daniel Cryans

hbase.master was removed when we added zookeeper, so now a client will do a
lookup in ZK instead of talking to a pre-determined master. So in a
way, hbase.zookeeper.quorum is what replaces hbase.master

FWIW that was done in 0.20.0 which was released in September of 2009, so
hbase.master has been removed 4 years ago.

J-D


On Fri, Oct 4, 2013 at 8:11 AM, Jay Vyas jayunit...@gmail.com wrote:

 Oh wow.  looking in the source this really is an old parameter: It appears
 that the answer to my question is :
 that these are the master parameters:

 src/main/resources/hbase-default.xml:namehbase.master.port/name
 src/main/resources/hbase-default.xml:
  namehbase.master.info.port/name
 src/main/resources/hbase-default.xml:
 namehbase.master.info.bindAddress/name
 src/main/resources/hbase-default.xml:
 namehbase.master.dns.interface/name
 src/main/resources/hbase-default.xml:
 namehbase.master.dns.nameserver/name
 src/main/resources/hbase-default.xml:
 namehbase.master.logcleaner.ttl/name
 src/main/resources/hbase-default.xml:
 namehbase.master.logcleaner.plugins/name
 src/main/resources/hbase-default.xml:
 valueorg.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner/value
 src/main/resources/hbase-default.xml:
 namehbase.master.keytab.file/name
 src/main/resources/hbase-default.xml:
 namehbase.master.kerberos.principal/name
 src/main/resources/hbase-default.xml:
 namehbase.master.hfilecleaner.plugins/name
 src/main/resources/hbase-default.xml:

 valueorg.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner/value
 src/test/resources/hbase-site.xml:
 namehbase.master.event.waiting.time/name
 src/test/resources/hbase-site.xml:namehbase.master.info.port/name
 src/test/resources/hbase-site.xml:descriptionThe port for the hbase
 master web UI
 src/test/resources/hbase-site.xml:
 namehbase.master.lease.thread.wakefrequency/name


 On Fri, Oct 4, 2013 at 11:06 AM, Jay Vyas jayunit...@gmail.com wrote:

  Thanks for the feedback !
 
  So - are you sure it has no effect?  By obsolete - do we mean
  deprecated?
 
  In this case - which parameters have replaced it and how [specifically] ?
 
  Any help on this issue would be appreciated , because im seeing an effect
  when i have the parameter , will check my hbase version and confirm.
 
 
  On Fri, Oct 4, 2013 at 2:05 AM, Harsh J ha...@cloudera.com wrote:
 
  That property hasn't been in effect since 0.90 (far as I can
  remember). Ever since we switched master discovery to ZK, the property
  has been obsolete.
 
  On Fri, Oct 4, 2013 at 5:13 AM, Jay Vyas jayunit...@gmail.com wrote:
   What happened to the hbase.master parameter?
  
   I dont see it in the docs... was it deprecated?
  
   It appears to still have an effect in 94.7
  
   --
   Jay Vyas
   http://jayunit100.blogspot.com
 
 
 
  --
  Harsh J
 
 
 
 
  --
  Jay Vyas
  http://jayunit100.blogspot.com
 



 --
 Jay Vyas
 http://jayunit100.blogspot.com

Re: HBase stucked because HDFS fails to replicate blocks

2013-10-03 Thread Jean-Daniel Cryans

I like the way you were able to dig down into multiple logs and present us
the information, but it looks more like GC than an HDFS failure. In your
region server log, go back to the first FATAL and see if it got a session
expired from ZK and other messages like a client not being able to talk to
a server for some amount of time. If it's the case then what you are seeing
is the result of IO fencing by the master.

J-D


On Wed, Oct 2, 2013 at 10:15 AM, Ionut Ignatescu
ionut.ignate...@gmail.comwrote:

 Hi,

 I have a HadoopHBase cluster, that runs Hadoop 1.1.2 and HBase 0.94.7.
 I notice an issue that stops normal cluster running.
 My use case: I have several MR jobs that read data from one HBase table in
 map phase and write data in 3 different tables during the reduce phase. I
 create table handler on my own, I don't
 TableOutputFormat. The only way out I found is to restart region server
 deamon on region server with problems.

 On namenode:
 cat namenode.2013-10-02 | grep blk_3136705509461132997_43329
 Wed Oct 02 13:32:17 2013 GMT namenode 3852-0@namenode:0 [INFO] (IPC Server
 handler 29 on 22700) org.apache.hadoop.hdfs.StateChange: BLOCK*
 NameSystem.allocateBlock:

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247.
 blk_3136705509461132997_43329
 Wed Oct 02 13:33:38 2013 GMT namenode 3852-0@namenode:0 [INFO] (IPC Server
 handler 13 on 22700) org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 commitBlockSynchronization(lastblock=blk_3136705509461132997_43329,
 newgenerationstamp=43366, newlength=40045568, newtargets=[
 10.81.18.101:50010],
 closeFile=false, deleteBlock=false)

 On region server:
 cat regionserver.2013-10-02 | grep 1380720737247
 Wed Oct 02 13:32:17 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
 (regionserver60020.logRoller)
 org.apache.hadoop.hbase.regionserver.wal.HLog: Roll

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720701436,
 entries=149, filesize=63934833.  for

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
 Wed Oct 02 13:33:37 2013 GMT regionserver 5854-0@datanode1:0 [WARN]
 (DataStreamer for file

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
 block blk_3136705509461132997_43329) org.apache.hadoop.hdfs.DFSClient:
 Error Recovery for block blk_3136705509461132997_43329 bad datanode[0]
 10.80.40.176:50010
 Wed Oct 02 13:33:37 2013 GMT regionserver 5854-0@datanode1:0 [WARN]
 (DataStreamer for file

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
 block blk_3136705509461132997_43329) org.apache.hadoop.hdfs.DFSClient:
 Error Recovery for block blk_3136705509461132997_43329 in pipeline
 10.80.40.176:50010, 10.81.111.8:50010, 10.81.18.101:50010: bad datanode
 10.80.40.176:50010
 Wed Oct 02 13:33:43 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
 (regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
 complete file

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
 retrying...
 Wed Oct 02 13:33:43 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
 (regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
 complete file

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
 retrying...
 Wed Oct 02 13:33:44 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
 (regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
 complete file

 /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
 retrying...

 cat regionserver.2013-10-02 | grep 1380720737247 | grep 'Could not
 complete' | wc -l
 5640


 In datanode logs, that runs on the same host with region server:
 cat datanode.2013-10-02 | grep blk_3136705509461132997_43329
 Wed Oct 02 13:32:17 2013 GMT datanode 5651-0@datanode1:0 [INFO]
 (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
 blk_3136705509461132997_43329 src: /10.80.40.176:36721 dest: /
 10.80.40.176:50010
 Wed Oct 02 13:33:37 2013 GMT datanode 5651-0@datanode1:0 [INFO]
 (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 10.80.40.176:50010,
 storageID=DS-812180968-10.80.40.176-50010-1380263000454, infoPort=50075,
 ipcPort=50020): Exception writing block blk_3136705509461132997_43329 to
 mirror 10.81.111.8:50010
 Wed Oct 02 13:33:37 2013 GMT datanode 5651-0@datanode1:0 [INFO]
 (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
 org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
 for block blk_3136705509461132997_43329 java.io.IOException: Connection
 reset by peer
 Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
 (PacketResponder 2 for Block blk_3136705509461132997_43329)

Re: Replication

2013-09-27 Thread Jean-Daniel Cryans

That means that the master cluster isn't able to see any region servers in
the slave cluster... is cluster b up? Can you create tables?

J-D


On Fri, Sep 27, 2013 at 3:23 AM, Arnaud Lamy al...@ltutech.com wrote:

 Hi,

 I tried to configure a replication with 2 boxes (ab). A hosts hbase  zk
 and b only hbase. A is on zk:/hbase and b on zk:hbase_b. I used
 start-hbase.sh script to start hbase and I changed HBASE_MANAGES_ZK=false
 on both.

 A is master and B is slave. I added a peer on A and when I list it I have:
 1 localhost:2181:/hbase_b ENABLED

 I created my table on A  B, added some data on A but nothing on B.

 When I look at my logs I have:
 2013-09-15 23:59:43,682 INFO org.apache.hadoop.hbase.**
 replication.regionserver.**ReplicationSource: Getting 0 rs from peer
 cluster # 1

 That means there's no slave plugged to my master. There's no time
 difference between A and B for info.

 I'm stucked (can't find anything on google). Do you have any idea why it
 doesn't work ?

 Arnaud

Re: What is causing my mappers to execute so damn slow?

2013-09-27 Thread Jean-Daniel Cryans

Your details are missing important bits like you configurations,
Hadoop/HBase versions, etc.

Doing those random reads inside your MR job, especially if they are reading
cold data, will indeed make it slower. Just to get an idea, if you skip
doing the Gets, how fast does it became?

J-D


On Fri, Sep 27, 2013 at 10:33 AM, Pavan Sudheendra pavan0...@gmail.comwrote:

 Hi everyone,
 I posted this question many time before and i've given full details on
 stackoverflow..

 http://stackoverflow.com/q/19056712/938959

 Please i need someone to guide me in the right direction here.

 Help much appreciated!

 --
 Regards-
 Pavan

Re: What is causing my mappers to execute so damn slow?

2013-09-27 Thread Jean-Daniel Cryans

I don't think there's a CDH that includes Hadoop 1.2.1

So either your code is doing something slow or it's the reading itself. For
the latter, make sure you go through
http://hbase.apache.org/book.html#perf.reading and we also recently had
this thread on the list were you can see some live performance debugging
http://www.mail-archive.com/user@hbase.apache.org/msg27174.html. For
example, make sure you're not running on the local job tracker.

J-D


On Fri, Sep 27, 2013 at 11:07 AM, Pavan Sudheendra pavan0...@gmail.comwrote:

 Hi Jean,
 HBase 0.94.6 and Hadoop 1.2.1 Cloudera Distributions..

 I infact tried that out, in place of doing the get operations , i created
 stub data and returned that instead.. It was practically at the same speed.

 Nothing changed.. After 20 mins or so when i check the job status.. It
 hardly reached 1,000,000 rows..


 On Fri, Sep 27, 2013 at 11:12 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:

  Your details are missing important bits like you configurations,
  Hadoop/HBase versions, etc.
 
  Doing those random reads inside your MR job, especially if they are
 reading
  cold data, will indeed make it slower. Just to get an idea, if you skip
  doing the Gets, how fast does it became?
 
  J-D
 
 
  On Fri, Sep 27, 2013 at 10:33 AM, Pavan Sudheendra pavan0...@gmail.com
  wrote:
 
   Hi everyone,
   I posted this question many time before and i've given full details on
   stackoverflow..
  
   http://stackoverflow.com/q/19056712/938959
  
   Please i need someone to guide me in the right direction here.
  
   Help much appreciated!
  
   --
   Regards-
   Pavan
  
 



 --
 Regards-
 Pavan

Re: Export API using start and stop row key !

2013-09-25 Thread Jean-Daniel Cryans

You'd need to use 0.94 (or CDH4.2+ since you are mentioning being on CDH)
to have access to TableInputFormat.SCAN_ROW_START and SCAN_ROW_STOP then
all you need to do is to copy Export's code and add what you're missing.

J-D


On Tue, Sep 24, 2013 at 5:42 PM, karunakar lkarunaka...@gmail.com wrote:

 Hi Experts,

 I would like to fetch data from hbase table using map reduce export API. I
 see that I can fetch data using start and stop time, but I don't see any
 information regarding start and stop row key. Can any expert guide me or
 give me an example in order fetch first 1000 rows (or start and stop row
 key) using export API which I can import to different table ?

 Hadoop 2.0.0-cdh4.1.2 HBase 0.92.1-cdh4.1.2

 Please let me know if you need more information.

 Thanks you.



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/Export-API-using-start-and-stop-row-key-tp4051182.html
 Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase Compression

2013-09-24 Thread Jean-Daniel Cryans

On flushing we do some cleanup, like removing deleted data that was already
in the MemStore or extra versions. Could it be that you are overwriting
recently written data?

48MB is the size of the Memstore that accumulated while the flushing
happened.

J-D


On Tue, Sep 24, 2013 at 3:50 AM, aiyoh79 tcheng...@gmail.com wrote:

 Hi,

 I am using hbase 0.94.11 and i feel a bit confuse when looking at the log
 file below:

 13/09/24 13:11:00 INFO regionserver.Store: Flushed , sequenceid=687077,
 memsize=
 122.1m, into tmp file
 hdfs://192.168.123.123:54310/hbase/usertable/b19289cf9b1400
 c6daddc347337bac03/.tmp/13f0d91efe784372a796585a6c1e05d3
 13/09/24 13:11:00 INFO regionserver.Store: Added
 hdfs://192.168.123.123:54310/hba

 se/usertable/b19289cf9b1400c6daddc347337bac03/family/13f0d91efe784372a796585a6c1
 e05d3, entries=432620, sequenceid=687077, filesize=64.4m
 13/09/24 13:11:00 INFO regionserver.HRegion: Finished memstore flush of
 ~128.2m/
 134402240, currentsize=48.0m/50366240 for region
 usertable,user4\xB4\xB0,1379998
 895119.b19289cf9b1400c6daddc347337bac03. in 1163ms, sequenceid=687077,
 compactio
 n requested=false

 It seems like it will first flush into a tmp file and the memsize is
 122.1m,
 but when it finally added, the size is 64.4m. Lastly, there are 2 more
 parameters which is 128.2m and 48.0m for currentsize.

 I never specify hbase.regionserver.codecs preperty in my hbase-site.xml
 file, so is the size difference still because of compression?

 Thanks,

 aiyoh79



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/Hbase-Compression-tp4051122.html
 Sent from the HBase User mailing list archive at Nabble.com.

Re: Hbase ports

2013-09-23 Thread Jean-Daniel Cryans

On Mon, Sep 23, 2013 at 9:14 AM, John Foxinhead john.foxinh...@gmail.comwrote:

 Hi all. I'm doing a project for my university so that i have to know
 perfectly how all the Hbase ports work. Studing the documentation i found
 that Zookeeper accept connection on port 2181, Hbase master on port 6
 and Hbase regionservers on port 60020. I didn't understand the importance
 of port 60010 on master and port 60030 on regionservers. Can i not use
 them?


From the documentation (http://hbase.apache.org/book.html#config.files):

hbase.regionserver.info.port

The port for the HBase RegionServer web UI Set to -1 if you do not want the
RegionServer UI to run.

Default: 60030
You can look for the other port in there too.


 More important: if i launch Hbase in pseudo-distribuited mode, running all
 processes on localhost, what ports are used for each of the processes if i
 launch 1, 2, 3 or more backup masters and if i launch few regionservers
 (less than 10) or a lot of regionservers (10, 20, 100)?


 It'll clash, you'll have to have different hbase-site.xml for each process
you want to start.

J-D

Re: openTSDB lose large amount of data when the client are writing

2013-09-19 Thread Jean-Daniel Cryans

Could happen if a region moves since locks aren't persisted, but if I were
you I'd ask on the opentsdb mailing list first.

J-D


On Thu, Sep 19, 2013 at 10:09 AM, Tianying Chang tich...@ebaysf.com wrote:

 Hi,

 I have a customer who use openTSDB. Recently we found that only less than
 10% data are written, rest are are lost. By checking the RS log, there are
 many row lock related issues, like below. It seems large amount of write to
 tsdb that need row lock caused the problem. Anyone else see similar
 problem?  Is it a bug of openTSDB? Or it is due to HBase exposed a
 vulnerable API?

 org.apache.hadoop.hbase.UnknownRowLockException: Invalid row lock
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getLockFromId(HRegionServer.java:2732)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2071)
 at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 13/09/18 12:08:30 ERROR regionserver.HRegionServer:
 org.apache.hadoop.hbase.UnknownRowLockException: -6180307918863136448
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.unlockRow(HRegionServer.java:2765)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)


 Thanks
 Tian-Ying

Re: Bulkload into empty table with configureIncrementalLoad()

2013-09-19 Thread Jean-Daniel Cryans

You need to create the table with pre-splits, see
http://hbase.apache.org/book.html#perf.writing

J-D


On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci antenucc...@gmail.comwrote:

 I have about 1 billion values I am trying to load into a new HBase table
 (with just one column and column family), but am running into some issues.
  Currently I am trying to use MapReduce to import these by first converting
 them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
 use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.  My
 code is essentially the same as this example:

 https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java

 The problem I'm running into is that only 1 reducer is created
 by configureIncrementalLoad(), and there is not enough space on this node
 to handle all this data.  configureIncrementalLoad() should start one
 reducer for every region the table has, so apparently the table only has 1
 region -- maybe because it is empty and brand new (my understanding of how
 regions work is not crystal clear)?  The cluster has 5 region servers, so
 I'd at least like that many reducers to handle this loading.

 On a side note, I also tried the command line tool, completebulkload, but
 am running into other issues with this (timeouts, possible heap issues) --
 probably due to only one server being assigned the task of inserting all
 the records (i.e. I look at the region servers' logs, and only one of the
 servers has log entries; the rest are idle).

 Any help is appreciated

 -Dolan Antenucci

Re: HBase Negation or NOT operator

2013-09-17 Thread Jean-Daniel Cryans

You can always remove the NOT clause by changing the statement, but I'm
wondering what your use case really is. HBase doesn't have secondary
indexes so, unless you are doing a short-ish scan (let's say a million
rows), it means you want to do a full table scan and that doesn't scale.

J-D


On Tue, Sep 17, 2013 at 1:34 AM, Ashwin Jain ashvyn.j...@gmail.com wrote:

 Hello All,

 Does HBase not support an SQL NOT operator on complex filters? I would like
 to filter out whatever matches a complex nested filter.

 my use case is to parse a query like this(below) and build a HBase filter
 from it.
 (field1=value1 AND NOT ((field2=value2 OR field3=value3) AND
 field4=value4))

 How to go about this , any ideas?  What will be a better approach -
 implement a custom filter that excludes a row qualified by another filter
 or to convert input query into an opposite query.

 Thanks,
 Ashwin

Re: user_permission ERROR: Unknown table

2013-09-17 Thread Jean-Daniel Cryans

Ah I see, well unless you setup Secure HBase there won't be any perms
enforcement.

So in which way is your application failing to use Selector? Do you have
an error message or stack trace handy?

J-D


On Tue, Sep 17, 2013 at 5:43 AM, BG bge...@mitre.org wrote:

 Well we are trying to find out why our application works when we use
 'Selectors' table.
 When we use 'Selectors2' it works just fine. So we wanted to see if it was
 a
 permission
 error. That is why we tried out user_permissions, but when they gave errors
 we wondered
 if that might enforce that maybe it is a permissions problem.

 bg



 --
 View this message in context:
 http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050838.html
 Sent from the HBase User mailing list archive at Nabble.com.

Re: show processlist equivalent in Hbase

2013-09-17 Thread Jean-Daniel Cryans

(putting cdh user in BCC, please don't cross-post)

The web UIs for both the master and the region server have a section called
Tasks and has a bunch of links like this:

Tasks

Show All Monitored Tasks Show non-RPC Tasks Show All RPC Handler Tasks Show
Active RPC Calls Show Client Operations View as JSON

J-D


On Tue, Sep 17, 2013 at 5:41 AM, Dhanasekaran Anbalagan
bugcy...@gmail.comwrote:

 Hi Guys,

 I want know show processlist in mysql equivalent in hbase any tool is
 there?.
 In Hbase Master webpage says only requestsPerSecond and table details only.
 I want know which process hitting load

 Please guide me.

 -Dhanasekaran.

 Did I learn something today? If not, I wasted it.

Re: Command to delete based on column Family + rowkey

2013-09-16 Thread Jean-Daniel Cryans

HBASE-8753 doesn't seem related.

Right now there's nothing in the shell that does the equivalent of this:

Delete.deleteFamily(byte [] family)

But it's possible to run java code in the jruby shell so in the end you can
still do it, just takes more lines.

J-D


On Mon, Sep 16, 2013 at 1:45 AM, Ted Yu yuzhih...@gmail.com wrote:

 Have you looked at https://issues.apache.org/jira/browse/HBASE-8753 ?

 Cheers

 On Sep 16, 2013, at 12:37 AM, Ramasubramanian 
 ramasubramanian.naraya...@gmail.com wrote:

  Hi,
 
  Thanks…but the requirement is to delete the fields for a single row key…
 can u pls help?
 
  regards,
  Rams
  On 10-Sep-2013, at 4:56 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:
 
  This?
 
  hbase(main):002:0 help alter
  Alter column family schema;  pass table name and a dictionary
  specifying new column family schema. Dictionaries are described
  on the main help command output. Dictionary must include name
  of column family to alter. For example,
 
  To change or add the 'f1' column family in table 't1' from defaults
  to instead keep a maximum of 5 cell VERSIONS, do:
 
  hbase alter 't1', NAME = 'f1', VERSIONS = 5
 
  To delete the 'f1' column family in table 't1', do:
 
  hbase alter 't1', NAME = 'f1', METHOD = 'delete'
 
  or a shorter version:
 
  hbase alter 't1', 'delete' = 'f1'
 
 
 
  2013/9/10 Ramasubramanian Narayanan 
 ramasubramanian.naraya...@gmail.com
 
  Manish,
 
  I need to delete all the columns for a particular column family of a
 given
  rowkey... I don't want to specify the column name (qualifier name) one
 by
  one to delete.
 
  Pls let me know is there any way to delete like that...
 
  regards,
  Rams
 
 
  On Tue, Sep 10, 2013 at 2:06 PM, manish dunani manishd...@gmail.com
  wrote:
 
  If you want to delete rowkey for particular columnfamily then you
 need to
  mention individually::
 
  delete 't','333','TWO:qualifier_name'
 
  This will definitely delete the records which you are looking for.
  Please revert back if it is not work.
 
 
  On Tue, Sep 10, 2013 at 1:40 PM, manish dunani manishd...@gmail.com
  wrote:
 
  hey rama,
 
  Try this::
 
  *deleteall 't','333'*
  *
  *
  I hope it will definitely works for you!!
 
 
 
 
  On Tue, Sep 10, 2013 at 1:31 PM, Ramasubramanian Narayanan 
  ramasubramanian.naraya...@gmail.com wrote:
 
  Dear All,
 
  Requirement is to delete all columns which belongs to a column
 family
  and
  for a particular rowkey.
 
  Have tried with the below command but record is not getting deleted.
 
  *  hbase deleteall 't1', 'r1', 'c1'*
  *
  *
  *Test result :*
  *
  *
  3) Scan the table 't'
 
  hbase(main):025:0 scan 't'
  ROW   COLUMN+CELL
  111  column=ONE:ename,
  timestamp=1378459582478, value=
  111  column=ONE:eno,
  timestamp=1378459582335, value=1000
  111  column=ONE:sal,
  timestamp=1378459582515, value=1500
  111  column=TWO:ename,
  timestamp=1378459582655, value=
  111  column=TWO:eno,
  timestamp=1378459582631, value=4000
  222  column=ONE:ename,
  timestamp=1378459582702, value=
  222  column=ONE:eno,
  timestamp=1378459582683, value=2000
  222  column=ONE:sal,
  timestamp=1378459582723, value=2500
  222  column=TWO:ename,
  timestamp=1378459582779, value=
  222  column=TWO:eno,
  timestamp=1378459582754, value=4000
  222  column=TWO:sal,
  timestamp=1378459582798, value=7500
  333  column=ONE:ename,
  timestamp=1378459582880, value=sss
  333  column=ONE:eno,
  timestamp=1378459582845, value=9000
  333  column=ONE:sal,
  timestamp=1378459582907, value=6500
  333  column=TWO:ename,
  timestamp=1378459582950, value=zzz
  333  column=TWO:eno,
  timestamp=1378459582931, value=
  333  column=TWO:sal,
  timestamp=1378459582968, value=6500
  3 row(s) in 0.0440 seconds
 
 -
  4) Delete the records from the table 't' in the rowkey '333' in
  the
  column family 'TWO'
 
 
  hbase(main):027:0 deleteall 't','333','TWO'
  0 row(s) in 0.0060 seconds
 
 -
 
  5) After deleting scan the table

Re: user_permission ERROR: Unknown table

2013-09-16 Thread Jean-Daniel Cryans

What are you trying to do bg? If you want to setup user permissions you
also need to have a secure HBase (the link that Ted posted) which
involves Kerberos.

J-D


On Mon, Sep 16, 2013 at 1:33 PM, Ted Yu yuzhih...@gmail.com wrote:

 See http://hbase.apache.org/book.html#d0e5135


 On Mon, Sep 16, 2013 at 1:06 PM, BG bge...@mitre.org wrote:

  Thanks.. Do I need to do this.
  We do NOT have kerberos running.
 
  bg
 
 
 
  --
  View this message in context:
 
 http://apache-hbase.679495.n3.nabble.com/user-permission-ERROR-Unknown-table-tp4050797p4050804.html
  Sent from the HBase User mailing list archive at Nabble.com.

Re: Information about hbase 0.96

2013-09-13 Thread Jean-Daniel Cryans

Release date is: when it gets released. We are currently going through
release candidates and as soon as one gets accepted we release it. I'd like
to say it's gonna happen this month but who knows.

There's probably one or two presentations online that explain what's in
0.96.0, but the source of truth at the moment is:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20(%20fixVersion%20%3D%20%220.96.0%22%20or%20fixVersion%20%3D%20%220.95.0%22%20%20or%20fixVersion%20%3D%20%220.95.1%22%20or%20fixVersion%20%3D%20%220.95.2%22%20)%20AND%20(status%20%3D%20Resolved%20OR%20status%20%3D%20Closed)%20ORDER%20BY%20issuetype%20DESC%2C%20priority%20DESC

J-D


On Thu, Sep 12, 2013 at 10:05 AM, Vimal Jain vkj...@gmail.com wrote:

 Hi,
 Where can i get information about hbase 0.96 like what are its additional
 features , its release date ?

 --
 Thanks and Regards,
 Vimal Jain

Re: High cpu usage on a region server

2013-09-12 Thread Jean-Daniel Cryans

Or roll back to CDH 4.2's HBase. They are fully compatible.

J-D

On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl la...@apache.org wrote:

Not that I am aware of. Reduce the HFile block size will lessen this
problem (but then cause other issues).

It's just a fix to the RegexStringFilter. You can just recompile that and
deploy it to the RegionServers (need to make it's in the class path before
the HBase jars).
Probably easier to roll a new release. It's a shame we did not see this
earlier.

-- Lars

From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org
Sent: Thursday, September 12, 2013 9:52 AM
Subject: Re: High cpu usage on a region server

Thanks Lars.

Are there any other workarounds for this issue until we get the fix ?
If not we might have to do the patch and rollout custom pkg.

On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl la...@apache.org wrote:
Yep... Very likely HBASE-9428:

8 threads:
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.lang.StringCoding.decode(StringCoding.java:178)
at java.lang.String.init(String.java:483)
at
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)
...

4 threads:
java.lang.Thread.State: RUNNABLE
at
sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79)
at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106)
at
java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
at
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140)
at java.lang.StringCoding.decode(StringCoding.java:179)
at java.lang.String.init(String.java:483)
at
org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96)

It's also consistent with what you see: Lots of garbage (hence tweaking
your GC options had a significant effect)
The fix is in 0.94.12, which is in RC right now, probably to be released
early next week.

-- Lars

From: OpenSource Dev dev.opensou...@gmail.com
To: user@hbase.apache.org
Sent: Thursday, September 12, 2013 8:15 AM
Subject: Re: High cpu usage on a region server

A server started getting busy last night, but this time it took ~5 hrs
to get from 15% busy to 75% busy. It is not running 80% flat-out yet.
But this is still very high compared to other servers that are running
under ~25% cpu usage. Only change that I made yesterday was the
addition of -XX:+UseParNewGC to hbase startup command.

http://pastebin.com/VRmujgyH

On Wed, Sep 11, 2013 at 2:28 PM, Stack st...@duboce.net wrote:
Can you thread dump the busy server and pastebin it?
Thanks,
St.Ack

On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev
dev.opensou...@gmail.comwrote:

Hi,

I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no
issues with writes/puts. System is handles upto 800k puts per seconds
without issue. On average we do 250k puts per second.

I am having the problem with Reads, I've also isolated where the
problem is but not been able to find the root cause.

I have 16 machines running hbase-region server, each has ~35 regions.
Once in a while cpu goes flatout 80% in 1 region server. These are the
things i've noticed in ganglia:

hbase.regionserver.request - evenly distributed. Not seeing any spikes
on the busy server
hbase.regionserver.blockCacheSize - between 500MB and 1000MB
hbase.regionserver.compactionQueueSize - avg 2 or less
hbase.regionserver.blockCacheHitRatio - 30% on busy node, 60% on other
nodes

JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC

I've noticed the system load moves to a different region, sometimes
within a minute, if the busy region is restarted.

Any suggestion what could be causing the load and/or what other
metrics should I check ?

Thank you!

Re: Performance analysis in Hbase

2013-09-10 Thread Jean-Daniel Cryans

Yeah there isn't a whole lot of documentation about metrics. Could it be
that you are still running on a default 1GB heap and you are pounding it
with multiple clients? Try raising the heap size?

FWIW I gave a presentation at HBaseCon with Kevin O'dell about HBase
operations which could shed some light:

Video:
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/hbasecon-2013--apache-hbase-meet-ops-ops-meet-apache-hbase-video.html
Slides: http://www.slideshare.net/cloudera/operations-session-6

J-D



On Tue, Sep 10, 2013 at 8:40 AM, Vimal Jain vkj...@gmail.com wrote:

 Can someone please throw some light on this aspect of Hbase ?


 On Thu, Sep 5, 2013 at 11:04 AM, Vimal Jain vkj...@gmail.com wrote:

  Just to add more information , i got following link which explains
 metrics
  related to RS.
  http://hbase.apache.org/book.html#rs_metrics
 
  Is there any resource which explains these metrics in detail ,( in
  official guide , there is just one line for each metric) .
 
 
  On Thu, Sep 5, 2013 at 10:06 AM, Vimal Jain vkj...@gmail.com wrote:
 
  Hi,
  I am running Hbase in *pseudo distributed mode on top of HDFS.*
  So far , its been running fine.
  In past i had some memory related issue ( long GC pauses ).
  So i wanted to know if there is a way through GUI ( web UI on
  60010,60030) or CLI ( shell) to get the health of Hbase ( with
 reference to
  its memory consumption , cpu starvation if any ).
  Please provide some resources where i can look for this information.
 
  --
  Thanks and Regards,
   Vimal Jain
 
 
 
 
  --
  Thanks and Regards,
  Vimal Jain
 



 --
 Thanks and Regards,
 Vimal Jain

Re: Getting column values in batches for a single row

2013-09-09 Thread Jean-Daniel Cryans

Scan.setBatch does what you are looking for, since with a Get there's no
way to iterate over mutliple calls:
https://github.com/apache/hbase/blob/0.94.2/src/main/java/org/apache/hadoop/hbase/client/Scan.java#L306

Just make sure to make the Scan start at the row you want and stop right
after it.

J-D


On Mon, Sep 9, 2013 at 12:28 PM, Sam William sa...@stumbleupon.com wrote:

 Hi,
   I have a table which is wide(with a single family) and the column
 qualifiers are timestamps.  I'd like to do a get on a rowkey, but I dont
 need to read all of the columns. I want to read the first n values  and
 then read more in batches if need be.  Is there a way to do this? Im on
 version-0.94.2.


 Thanks

Re: HBase distributed mode issue

2013-09-03 Thread Jean-Daniel Cryans

What's your /etc/hosts on the master like? HBase does a simple lookup to
get the machine's hostname and it seems your need reports itself as being
localhost.


On Tue, Sep 3, 2013 at 6:23 AM, Omkar Joshi omkar.jo...@lntinfotech.comwrote:

 I'm trying to set up a 2-node HBase cluster in distributed mode.

 Somehow, my regionserver/slave is connecting to 'localhost' for the master
 despite of adding the appropriate property for master in hbase-site.xml.

 The detailed thread(in the mail, the files etc. would look cluttered,
 hence, providing the thread to an external site) is here :

 http://stackoverflow.com/questions/18587512/hbase-distributed-mode


 Regards,
 Omkar Joshi


 
 The contents of this e-mail and any attachment(s) may contain confidential
 or privileged information for the intended recipient(s). Unintended
 recipients are prohibited from taking action on the basis of information in
 this e-mail and using or disseminating the information, and must notify the
 sender and delete it from their system. LT Infotech will not accept
 responsibility or liability for the accuracy or completeness of, or the
 presence of any virus or disabling code in this e-mail

Re: counter Increment gives DonotRetryException

2013-08-29 Thread Jean-Daniel Cryans

You probably put a string in there that was a number, and increment expects
a 8 bytes long. For example, if you did:

put 't1', '9row27', 'columnar:column1', '1'

Then did an increment on that, it would fail.

J-D


On Thu, Aug 29, 2013 at 4:42 AM, yeshwanth kumar yeshwant...@gmail.comwrote:

 i am newbie to Hbase,
 going through Counters topic,
 whenever i perform increment like

 incr 't1','9row27','columnar:column1',1

 it gives an

 ERROR: org.apache.hadoop.hbase.DoNotRetryIOException:
 org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field
 that isn't 64 bits wide

 looking for some help

Re: [Question: replication] why only one regionserver is used during replication? 0.94.9

2013-08-27 Thread Jean-Daniel Cryans

Region servers replicate data written to them, so look at how your regions
are distributed.

J-D


On Tue, Aug 27, 2013 at 11:29 AM, Demai Ni nid...@gmail.com wrote:

 hi, guys,

 I am using hbase 0.94.9. And setup replication from a 4-nodes master(3
 regserver) to a 3-nodes slave(2 regserver).

 I can tell that all source regservers  can successfully replicate data.
 However, it seems for each particular table, only one regserver will handle
 its replication at each given table.

 For example, I am using YCSB to load 1,000,000 rows with workloada, with 16
 threads. During the load period, I looked at the ageOfLastShippedOp and
 sizeOfLogQueue. I can tell one of the regserver from Master is doing the
 replication. While values of both age and sizeOfLog are growing, another
 two regserver doesn't come into help.

 So does that mean: for each table and process, only one regionserver will
 do the replication regardless how long the queue is? Or did I miss some
 setup configuration?

 Thanks.

 Demai

Re: Is downgrade from 0.96.0 to 0.94.6 possible?

2013-08-23 Thread Jean-Daniel Cryans

FYI you'll be in the same situation with 0.95.2, actually worse since it's
really just a developer preview release.

But if you meant try in its strict sense, ie use it on a test cluster,
then yes please do. The more people we get to try it out the better 0.96.0
will be.

J-D


On Thu, Aug 22, 2013 at 9:58 PM, Xiong LIU liuxiongh...@gmail.com wrote:

 Thanks, Stack. I will try 0.95.2 ahead.

 
 Best Wishes


 On Fri, Aug 23, 2013 at 11:28 AM, Stack st...@duboce.net wrote:

  On Thu, Aug 22, 2013 at 8:00 PM, Xiong LIU liuxiongh...@gmail.com
 wrote:
 
   We are considering to upgrade our hbase cluster from version 0.94.6 to
   0.96.0 once 0.96.0 is out.
  
   I want to know whether any possible failure may happen during the
 upgrade
   progress, and if it does happen, is it possible to downgrade to 0.94.6?
  
  
  No.
 
  We do not have anyone working on making it so you can rollback.
 
 
 
 
   Is there any best practice of upgrading 0.94.x to 0.96.0?
  
 
  Don't be the first (smile).
 
  Ask later after we get some experience moving folks through the upgrade.
   The upgrade process has had little exercise as of this date.
 
  St.Ack

Re: Replication queue?

2013-08-20 Thread Jean-Daniel Cryans

You can find a lot here: http://hbase.apache.org/replication.html

And how many logs you can queue is how much disk space you have :)


On Tue, Aug 20, 2013 at 7:23 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi,

 If I have a master - slave replication, and master went down, replication
 will start back where it was when master will come back online. Fine.
 If I have a master - slave replication, and slave went down, is the data
 queued until the slave come back online and then sent? If so, how big can
 be this queu, and how long can the slave be down?

 Same questions for master - master... I guess for this one, it's like for
 the 1 line above and it's fine, right?

 Thanks,

 JM

Re: Major Compaction in 0.90.6

2013-08-20 Thread Jean-Daniel Cryans

On Mon, Aug 19, 2013 at 11:52 PM, Monish r monishs...@gmail.com wrote:

Hi Jean,

s/Jean/Jean-Daniel ;)

Thanks for the explanation.

Just a clarification on the third answer,

In our current cluster ( 0.90.6 ) , i find that irrespective of whether TTL
is set or not , Major compaction compaction rewrites hfile for the region (
there is only one hfile for that region ) on every manual major compaction
trigger.

Can you enable DEBUG logs? You'd see why the major compaction is triggered.

log :

2013-08-19 14:15:29,926 INFO org.apache.hadoop.hbase.regionserver.Store:
Completed major compaction of 1 file(s), new

file=hdfs://x.x.x.x:9000/hbase/NOTIFICATION_HISTORY/b00086bca62ee55796a960002291aca4/n/4754838096619480671

i find a new file is created for every major compaction triggger.

Regards,
R.Monish

On Mon, Aug 19, 2013 at 11:52 PM, Jean-Daniel Cryans jdcry...@apache.org
wrote:

Inline.

J-D

On Mon, Aug 19, 2013 at 2:48 AM, Monish r monishs...@gmail.com wrote:

Hi guys,
I have the following questions in HBASE 0.90.6

1. Does hbase use only one compaction thread to handle both major and
minor
compaction?

Yes, look at CompactSplitThread

2. If hbase uses multiple compaction threads, which configuration
parameter
defines the number of compaction threads?

It doesn't in 0.90.6 but CompactSplitThread lists those for 0.92+

hbase.regionserver.thread.compaction.large
hbase.regionserver.thread.compaction.small

3. After hbase.majorcompaction.interval from last major compaction ,if
major compaction is executed on a table already major compacted Does
hbase
skip all the table regions from major compaction?

Determining if something is major-compacted is definitely not at the
table-level.

In 0.90.6, MajorCompactionChecker will ask HRegion.isMajorCompaction() to
check if it needs to major compact again, which in turns checks every
Store. FWW if you have TTL turned on it will still major compact a major
compacted file, HFiles don't have an index of what's deleted or TTL'd and
it doesn't do a full read of each files to check.

Regards,
R.Monish

Re: HDFS Restart with Replication

2013-08-02 Thread Jean-Daniel Cryans

Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side
you do stop-all.sh. I think your ordering is correct but I'm not sure
you are using the right commands.

J-D

On Fri, Aug 2, 2013 at 8:27 AM, Patrick Schless
patrick.schl...@gmail.com wrote:
Ah, I bet the issue is that I'm stopped the HMaster, but not the Region
Servers, then restarting HDFS. What's the correct order of operations for
bouncing everything?

On Thu, Aug 1, 2013 at 5:21 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

Can you follow the life of one of those blocks though the Namenode and
datanode logs? I'd suggest you start by doing a fsck on one of those
files with the option that gives the block locations first.

By the way why do you have split logs? Are region servers dying every
time you try out something?

On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
Yup, 14 datanodes, all check back in. However, all of the corrupt files
seem to be splitlogs from data05. This is true even though I've done
several restarts (each restart adding a few missing blocks). There's
nothing special about data05, and it seems to be in the cluster, the same
as anyone else.

On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans jdcry...@apache.org
wrote:

I can't think of a way how your missing blocks would be related to
HBase replication, there's something else going on. Are all the
datanodes checking back in?

J-D

On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
I'm running:
CDH4.1.2
HBase 0.92.1
Hadoop 2.0.0

Is there an issue with restarting a standby cluster with replication
running? I am doing the following on the standby cluster:

- stop hmaster
- stop name_node
- start name_node
- start hmaster

When the name node comes back up, it's reliably missing blocks. I
started
with 0 missing blocks, and have run through this scenario a few times,
and
am up to 46 missing blocks, all from the table that is the standby for
our
production table (in a different datacenter). The missing blocks all
are
from the same table, and look like:

blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com
,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com
%3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com
%2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com

%252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp

Do I have to stop replication before restarting the standby?

Thanks,
Patrick

Re: HDFS Restart with Replication

2013-08-02 Thread Jean-Daniel Cryans

Ah then doing bin/hbase-daemon.sh stop master on the master node is
the equivalent, but don't stop the region server themselves as the
master will take care of it. Doing a stop on the master and the region
servers will screw things up.

J-D

On Fri, Aug 2, 2013 at 3:28 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
Doesn't stop-hbase.sh (and its ilk) require the server to be able to manage
the clients (using unpassworded SSH keys, for instance)? I don't have that
set up (for security reasons). I use capistrano for all these sort of
coordination tasks.

On Fri, Aug 2, 2013 at 12:07 PM, Jean-Daniel Cryans
jdcry...@apache.orgwrote:

Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side
you do stop-all.sh. I think your ordering is correct but I'm not sure
you are using the right commands.

J-D

On Thu, Aug 1, 2013 at 5:21 PM, Jean-Daniel Cryans jdcry...@apache.org
wrote:

Can you follow the life of one of those blocks though the Namenode and
datanode logs? I'd suggest you start by doing a fsck on one of those
files with the option that gives the block locations first.

By the way why do you have split logs? Are region servers dying every
time you try out something?

On Thu, Aug 1, 2013 at 3:16 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
Yup, 14 datanodes, all check back in. However, all of the corrupt
files
seem to be splitlogs from data05. This is true even though I've done
several restarts (each restart adding a few missing blocks). There's
nothing special about data05, and it seems to be in the cluster, the
same
as anyone else.

On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans
jdcry...@apache.org
wrote:

I can't think of a way how your missing blocks would be related to
HBase replication, there's something else going on. Are all the
datanodes checking back in?

J-D

On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
I'm running:
CDH4.1.2
HBase 0.92.1
Hadoop 2.0.0

Is there an issue with restarting a standby cluster with
replication
running? I am doing the following on the standby cluster:

- stop hmaster
- stop name_node
- start name_node
- start hmaster

When the name node comes back up, it's reliably missing blocks. I
started
with 0 missing blocks, and have run through this scenario a few
times,
and
am up to 46 missing blocks, all from the table that is the standby
for
our
production table (in a different datacenter). The missing blocks
all
are
from the same table, and look like:

blk_-2036986832155369224 /hbase/splitlog/
data01.sea01.staging.tdb.com
,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com
%3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com
%2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com

%252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp

Do I have to stop replication before restarting the standby?

Thanks,
Patrick

Re: HDFS Restart with Replication

2013-08-01 Thread Jean-Daniel Cryans

I can't think of a way how your missing blocks would be related to
HBase replication, there's something else going on. Are all the
datanodes checking back in?

J-D

On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
 I'm running:
 CDH4.1.2
 HBase 0.92.1
 Hadoop 2.0.0

 Is there an issue with restarting a standby cluster with replication
 running? I am doing the following on the standby cluster:

 - stop hmaster
 - stop name_node
 - start name_node
 - start hmaster

 When the name node comes back up, it's reliably missing blocks. I started
 with 0 missing blocks, and have run through this scenario a few times, and
 am up to 46 missing blocks, all from the table that is the standby for our
 production table (in a different datacenter). The missing blocks all are
 from the same table, and look like:

 blk_-2036986832155369224 /hbase/splitlog/data01.sea01.staging.tdb.com
 ,60020,1372703317824_hdfs%3A%2F%2Fname-node.sea01.staging.tdb.com
 %3A8020%2Fhbase%2F.logs%2Fdata05.sea01.staging.tdb.com
 %2C60020%2C1373557074890-splitting%2Fdata05.sea01.staging.tdb.com
 %252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp

 Do I have to stop replication before restarting the standby?

 Thanks,
 Patrick

Re: HDFS Restart with Replication

2013-08-01 Thread Jean-Daniel Cryans

Can you follow the life of one of those blocks though the Namenode and
datanode logs? I'd suggest you start by doing a fsck on one of those
files with the option that gives the block locations first.

By the way why do you have split logs? Are region servers dying every
time you try out something?

On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

I can't think of a way how your missing blocks would be related to
HBase replication, there's something else going on. Are all the
datanodes checking back in?

J-D

On Thu, Aug 1, 2013 at 2:17 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
I'm running:
CDH4.1.2
HBase 0.92.1
Hadoop 2.0.0

Is there an issue with restarting a standby cluster with replication
running? I am doing the following on the standby cluster:

- stop hmaster
- stop name_node
- start name_node
- start hmaster

When the name node comes back up, it's reliably missing blocks. I started
with 0 missing blocks, and have run through this scenario a few times,
and
am up to 46 missing blocks, all from the table that is the standby for
our
production table (in a different datacenter). The missing blocks all are
from the same table, and look like:

%252C60020%252C1373557074890.1374960698485/tempodb-data/c9cdd64af0bfed70da154c219c69d62d/recovered.edits/01366319450.temp

Do I have to stop replication before restarting the standby?

Thanks,
Patrick

Re: Can't solve the Unable to load realm info from SCDynamicStore error

2013-07-31 Thread Jean-Daniel Cryans

Unable to load realm info from SCDynamicStore is only a warning and
a red herring.

What seems to be happening is that your shell can't reach zookeeper.
Are Zookeeper and HBase running? What other health checks have you
done?

J-D

On Tue, Jul 30, 2013 at 10:28 PM, Seth Edwards sethaedwa...@gmail.com wrote:
 I am somewhat new to HBase and was using it fine locally. At some point I
 started getting

 Unable to load realm info from SCDynamicStore when I would try to run HBase
 in standalone mode. I'm on Mac OSX 10.8.4. I have gone through many steps
 mentioned on Stack Overflow, changing configurations in hbase-env.sh. I've
 tried this on hbase version 0.94.7 and 0.94.9.

 Here is a gist of the stack trace I receive when trying to create a table
 with the shell

 https://gist.github.com/Sedward/2570beade8c9528682c3

Re: Excessive .META scans

2013-07-29 Thread Jean-Daniel Cryans

Can you tell who's doing it? You could enable IPC debug for a few secs
to see who's coming in with scans.

You could also try to disable pre-fetching, set hbase.client.prefetch.limit to 0

Also, is it even causing a problem or you're just worried it might
since it doesn't look normal?

J-D

On Mon, Jul 29, 2013 at 10:32 AM, Varun Sharma va...@pinterest.com wrote:
 Hi folks,

 We are seeing an issue with hbase 0.94.3 on CDH 4.2.0 with excessive .META.
 reads...

 In the steady state where there are no client crashes and there are no
 region server crashes/region movement, the server holding .META. is serving
 an incredibly large # of read requests on the .META. table.

 From my understanding, in the steady state, region locations should be
 indefinitely cached in the client. The client is running a work load of
 multiput(s), puts, gets and coprocessor calls.

 Thanks
 Varun

Re: Altering table column family attributes without disabling the table

2013-07-23 Thread Jean-Daniel Cryans

You could always set hbase.online.schema.update.enable to true on your
master, restart it (but not the cluster), and you could do what you
are describing... but it's a risky feature to use before 0.96.0.

Did you also set hbase.replication to true? If not, you'll have to do
it on the region servers and the master via a rolling restart.

J-D

Re: Bulk Load on HBase 0.95.1-hadoop1

2013-07-18 Thread Jean-Daniel Cryans

0.95.1 is a developer preview release, if you are just starting with HBase
please grab the stable release from 0.94, for example
http://mirrors.sonic.net/apache/hbase/stable/

J-D


On Thu, Jul 18, 2013 at 1:51 PM, Jonathan Cardoso
jonathancar...@gmail.comwrote:

 I was trying to follow the instructions from
 thishttp://www.thecloudavenue.com/2013/04/bulk-loading-data-in-hbase.html
 website
 to insert a lot of data to HBase using MapReduce, but as with other
 approaches I found on the web I have always the same problem:

 I have compile erros because classes from package
 org.apache.hadoop.hbase.mapreduce.* cannot be found.

  To run a mapreduce task using HBase I have to downgrade the version of my
 HBase to, for example, 0.92?


 *Jonathan Cardoso** **
 Universidade Federal de Goias*

Re: several doubts about region split?

2013-07-17 Thread Jean-Daniel Cryans

Inline.

J-D

On Wed, Jul 17, 2013 at 7:10 AM, yonghu yongyong...@gmail.com wrote:
 Thanks for your quick response!

 For the question one, what will be the latency? How long we need to wait
 until the daughter regions are again online?

Usually a matter of 1-2 seconds.


 regards!

 Yong



 On Wed, Jul 17, 2013 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:

 bq. Does it mean the region which will be splitted is not available
 anymore?

 Right.

 bq. What happened to the read and write requests to that region?

 The requests wouldn't be served by the hosting region server until daughter
 regions become online.

 Will try to dig up answer to question #2.
 In short, load balancer is supposed to offload one of the daughter regions
 if continuous write load incurs.

 Cheers

 On Wed, Jul 17, 2013 at 6:53 AM, yonghu yongyong...@gmail.com wrote:

  Dear all,
 
  From the HBase reference book, it mentions that when RegionServer splits
  regions, it will offline the split region and then adds the daughter
  regions to META, opens daughters on the parent's hosting RegionServer and
  then reports the split to the Master.
 
  I have a several questions:
 
  1. What does offline means? Does it mean the region which will be
 splitted
  is not available anymore? What happened to the read and write requests to
  that region?
 
  2. From the description, if I understand right it means that now the
  RegionServer will contain two Regions (One RegionServer for both daughter
  and parent regions ) instead of one RegionSever for daughter and one for
  parent. If it is, what are the benefits of this approach? Hot-spot
 problem
  is still there.

It's not a load problem it's a data problem. We're splitting when we
have enough data. Then HBase relies on the master doing some balancing
on the cluster.

Moreover, this approach will be a big problem if we use the
  HBase default split approach. Suppose we bulk load data into HBase
 cluster,
  initially every write request will be accepted by only one RegionServer.
  After some write requests, the RegionServer cannot response any write
  request as it reaches its disk volume threshold. Hence, some data must be
  removed from one RegionSever to the other RegionServer. The question is
  that why we don't do it at the region split time?

Since you read the reference book, you will also find in there that we
recommend never bulk loading data into a table with only 1 region. You
should always create your tables with pre-defined splits if you plan
on importing a lot of data.

J-D

Re: Memory leak in HBase replication ?

2013-07-17 Thread Jean-Daniel Cryans

Yean WARN won't give us anything, and please try to get us a fat log. Post
it on pastebin or such.

Thx,

J-D


On Wed, Jul 17, 2013 at 11:03 AM, Anusauskas, Laimonas 
lanusaus...@corp.untd.com wrote:

 J-D,

 I have log level org.apache=WARN and there is only following in the logs
 before GC happens:

 2013-07-17 10:56:45,830 ERROR
 org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent
 configuration. Previous configuration for using table name in metrics:
 true, new configuration: false
 2013-07-17 10:56:47,395 WARN
 org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is
 available

 I'll try upping log level to DEBUG to see if that shows anything and will
 run jstack.

 Thanks,

 Limus

Re: Memory leak in HBase replication ?

2013-07-17 Thread Jean-Daniel Cryans

1GB is a pretty small heap and it could be that the default size for logs
to replicate is set to high. The default
for replication.source.size.capacity is 64MB. Can you set it much lower on
your master cluster (on each RS), like 2MB, and see if it makes a
difference?

The logs and the jstack seem to correlate in that sense.

Thx,

J-D


On Wed, Jul 17, 2013 at 1:40 PM, Anusauskas, Laimonas 
lanusaus...@corp.untd.com wrote:

 And here is the jstack output.

 http://pastebin.com/JKnQYqRg

Re: Memory leak in HBase replication ?

2013-07-17 Thread Jean-Daniel Cryans

Yes... your master cluster must have helluva backup to replicate :)

Seems to make a good argument to lower the default setting. What do you
think?

J-D


On Wed, Jul 17, 2013 at 3:37 PM, Anusauskas, Laimonas 
lanusaus...@corp.untd.com wrote:

 Thanks, setting replication.source.size.capacity to 2MB resolved this. I
 see heap growing to about 700MB but then going down and full GC is only
 triggered occasionally.

 And while primary cluster is has very little load ( 100 requests/sec) the
 standby cluster is  now pretty loaded at 5K requests/sec, presumable
 because it has to replicate all the pending changes. So perhaps this is the
 issue that happens when standby cluster goes away for a while and then has
 to catch up.

 Really appreciate the help.

 Limus

Re: HBase Standalone against multiple drives

2013-07-16 Thread Jean-Daniel Cryans

The local filesystem implementation doesn't support multiple drives
AFAIK, so your best bet is to RAID your disks if that's really
something you want to do.

Else, you have to use HDFS.

J-D

On Tue, Jul 16, 2013 at 8:55 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
 Hi,

 In standalone mode, HBase does not use HDFS -- it uses the local
 filesystem instead but is there a way to give it more than one directory
 to write on?

 Or is pseudo-distributed mode the only way to do that?

 Thanks,

 JM

Re: Replication - some timestamps off by 1 ms

2013-07-11 Thread Jean-Daniel Cryans

Are those incremented cells?

J-D

On Thu, Jul 11, 2013 at 10:23 AM, Patrick Schless
patrick.schl...@gmail.com wrote:
 I have had replication running for about a week now, and have had a lot of
 data flowing to our slave cluster over that time. Now, I'm running the
 verifyrep MR job over a 1-hour period a couple days ago (which should be
 fully replicated), and I'm seeing a small number of BADROWS.
 Spot-checking a few of them, the issue seems to be that the rows are
 present, and have the same values, but a single cell in the row will be off
 by 1ms.

 For instance, the log reports this error:
 java.lang.Exception: This result was different:
 keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:s\xC0\x01/1373470923084/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8}
 compared to
 keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:s\xC0\x01/1373470923084/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8,
 01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8}

 Some diffing reduces the issue down to:
 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8
 compared to
 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8.

 I'm assuming that the value before /Put is the cell's timestamp, which
 means that the copies are off by 1ms.

 Any idea what could cause this? So far (the job is still running), the
 problem seems rare (about 0.05% of rows).

 Thanks,
 Patrick

Re: Replication - some timestamps off by 1 ms

2013-07-11 Thread Jean-Daniel Cryans

Yeah increments won't work. I guess the warning isn't really visible
but one place you can see it is:

$ ./bin/hadoop jar ../hbase/hbase.jar
An example program must be given as the first argument.
Valid program names are:
CellCounter: Count cells in HBase table
completebulkload: Complete a bulk data load.
copytable: Export a table from local cluster to peer cluster
export: Write table data to HDFS.
import: Import data written by Export.
importtsv: Import data in TSV format.
rowcounter: Count rows in HBase table

verifyrep: Compare the data from tables in two different clusters.
WARNING: It doesn't work for incrementColumnValues'd cells since the
timestamp is changed after being appended to the log.

The problem is that increments' timestamps are different in the WAL
and in the final KV that's stored in HBase.

J-D

On Thu, Jul 11, 2013 at 12:19 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
It's possible, but I'm not sure. This is a live system, and we do use
increment, and it's a smaller portion of our writes into HBase. I can try
to duplicate it, but I can't say how these specific cells got written.

Would incremented cells not get replicated correctly?

On Thu, Jul 11, 2013 at 12:53 PM, Jean-Daniel Cryans
jdcry...@apache.orgwrote:

Are those incremented cells?

J-D

On Thu, Jul 11, 2013 at 10:23 AM, Patrick Schless
patrick.schl...@gmail.com wrote:
I have had replication running for about a week now, and have had a lot
of
data flowing to our slave cluster over that time. Now, I'm running the
verifyrep MR job over a 1-hour period a couple days ago (which should be
fully replicated), and I'm seeing a small number of BADROWS.
Spot-checking a few of them, the issue seems to be that the rows are
present, and have the same values, but a single cell in the row will be
off
by 1ms.

For instance, the log reports this error:
java.lang.Exception: This result was different:

keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:s\xC0\x01/1373470923084/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8}
compared to

keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:s\xC0\x01/1373470923084/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8,

01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8}

Some diffing reduces the issue down to:

01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8
compared to

01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8.

I'm assuming that the value before /Put is the cell's timestamp, which
means that the copies are off by 1ms.

Any idea what could cause this? So far (the job is still running), the
problem seems rare (about 0.05% of rows).

Thanks,
Patrick

Re: Replication - some timestamps off by 1 ms

2013-07-11 Thread Jean-Daniel Cryans

Yeah verifyrep is a pretty basic tool, there's tons of room for
improvement. For the moment I guess you can ignore the 8 bytes cells
that aren't printable strings. Feel free to hack around that MR job
and maybe contribute back?

The use case for which I built it had loads of tables and the ones
that had ICVs pretty much only had that, so it was easy to verify just
a couple of tables to have a good idea of how it was doing.

J-D

On Thu, Jul 11, 2013 at 2:36 PM, Patrick Schless
patrick.schl...@gmail.com wrote:
Interesting (thanks for the info). I don't suppose there's an easy way to
filter those incremented cells out, so the response from verifyRep is
meaningful? :)

On Thu, Jul 11, 2013 at 3:44 PM, Jean-Daniel Cryans
jdcry...@apache.orgwrote:

Yeah increments won't work. I guess the warning isn't really visible
but one place you can see it is:

verifyrep: Compare the data from tables in two different clusters.
WARNING: It doesn't work for incrementColumnValues'd cells since the
timestamp is changed after being appended to the log.

The problem is that increments' timestamps are different in the WAL
and in the final KV that's stored in HBase.

J-D

Would incremented cells not get replicated correctly?

On Thu, Jul 11, 2013 at 12:53 PM, Jean-Daniel Cryans
jdcry...@apache.orgwrote:

Are those incremented cells?

J-D

On Thu, Jul 11, 2013 at 10:23 AM, Patrick Schless
patrick.schl...@gmail.com wrote:
I have had replication running for about a week now, and have had a
lot
of
data flowing to our slave cluster over that time. Now, I'm running the
verifyrep MR job over a 1-hour period a couple days ago (which should
be
fully replicated), and I'm seeing a small number of BADROWS.
Spot-checking a few of them, the issue seems to be that the rows are
present, and have the same values, but a single cell in the row will
be
off
by 1ms.