Incremental backup of hbase with export not working

2013-12-02 Thread oc tsdb
Hi ,

In order to take  incremental backup using export of hbase , we followed

http://hbase.apache.org/book/ops_mgt.html#import

Few things that need clarification are :

1. what does version mean? is is the same version number which we give
during creation of hbase table ?

2. what if we don't specify the version and just specify start and end time
stamps ?


kindly provide us an example how to take incremental hbase backup using
exportin an interval.


We did some experiments with version and start time combination and results
are as follows

1. we created a table with version=1 and tested the import CLI using the
same version (version =1 ) and start-end times .even though the data is
present between start and end intervals , we didn't get any data.

2. without specifying the version , we got all the data irrespective of the
start and end times

Kindly clarify us how to specify the version , time stamp range to match
our requirements .

Thanks,
Oc.tsdb


Hbase Region Size

2013-12-02 Thread Vineet Mishra
Hi

Can Anyone tell me the Java API for getting the Region Size of a table!

Thanks!


Merging Hbase Region for a Table

2013-12-02 Thread Vineet Mishra
Hi

I have some 2000+ of Auto-Region created which I want to make down to a
less number.
I am using Hbase 0.94, is there a way I can merge the Region without
loosing or dirtying up the data.

Thanks!


Re: Merging Hbase Region for a Table

2013-12-02 Thread Jean-Marc Spaggiari
Hi Vineet.

For 0.94 you can only offline-merge.

http://hbase.apache.org/book/ops.regionmgt.html#ops.regionmgt.merge

JM


2013/12/2 Vineet Mishra clearmido...@gmail.com

 Hi

 I have some 2000+ of Auto-Region created which I want to make down to a
 less number.
 I am using Hbase 0.94, is there a way I can merge the Region without
 loosing or dirtying up the data.

 Thanks!



Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
Hi Vineet,

If you want the entire table size I don't think there is any API for that.
If you want the size of the table on the disk (compressed) they you are
better to use HDFS API.

JM


2013/12/2 Vineet Mishra clearmido...@gmail.com

 Hi

 Can Anyone tell me the Java API for getting the Region Size of a table!

 Thanks!



Re: Hbase Region Size

2013-12-02 Thread Mike Axiak
Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in
the client, but HBaseAdmin has what you need [1].

   HTableDescriptor myDescriptor =
hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
   System.out.println(my-table has a max region size of  +
myDescriptor.getMaxFileSize());


1:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html


On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Hi Vineet,

 If you want the entire table size I don't think there is any API for that.
 If you want the size of the table on the disk (compressed) they you are
 better to use HDFS API.

 JM


 2013/12/2 Vineet Mishra clearmido...@gmail.com

  Hi
 
  Can Anyone tell me the Java API for getting the Region Size of a table!
 
  Thanks!
 



Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
Same for a single region. If it's compressed, you might want to look into
HDFS directly...


2013/12/2 Mike Axiak m...@axiak.net

 Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in
 the client, but HBaseAdmin has what you need [1].

HTableDescriptor myDescriptor =
 hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
System.out.println(my-table has a max region size of  +
 myDescriptor.getMaxFileSize());


 1:

 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html


 On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org
  wrote:

  Hi Vineet,
 
  If you want the entire table size I don't think there is any API for
 that.
  If you want the size of the table on the disk (compressed) they you are
  better to use HDFS API.
 
  JM
 
 
  2013/12/2 Vineet Mishra clearmido...@gmail.com
 
   Hi
  
   Can Anyone tell me the Java API for getting the Region Size of a table!
  
   Thanks!
  
 



Re: Hbase Region Size

2013-12-02 Thread Vineet Mishra
Actually I am looking for the Size of the Region, and not for the whole
table. Although the Hbase internally do the Max file size check to split
the Region in a autonomous manner, hence there should be some way to get it.


On Mon, Dec 2, 2013 at 7:51 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Same for a single region. If it's compressed, you might want to look into
 HDFS directly...


 2013/12/2 Mike Axiak m...@axiak.net

  Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing
 in
  the client, but HBaseAdmin has what you need [1].
 
 HTableDescriptor myDescriptor =
  hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
 System.out.println(my-table has a max region size of  +
  myDescriptor.getMaxFileSize());
 
 
  1:
 
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
 
 
  On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org
   wrote:
 
   Hi Vineet,
  
   If you want the entire table size I don't think there is any API for
  that.
   If you want the size of the table on the disk (compressed) they you are
   better to use HDFS API.
  
   JM
  
  
   2013/12/2 Vineet Mishra clearmido...@gmail.com
  
Hi
   
Can Anyone tell me the Java API for getting the Region Size of a
 table!
   
Thanks!
   
  
 



Re: What is HBase compaction-queue-size at all?

2013-12-02 Thread Jean-Marc Spaggiari
   - Is it the *number of Store* of regionserver need to be major compacted
   ? or numbers of which is* being* compacted currently ?


This is the number that are currently in the pipe. Doesn.'t mean they are
compacting right now, but they are queued for compaction. and not necessary
major compaction. Major is only if all the regions need to compact.

I was discovering that at some time it got *regionserver
compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
impossible since I have only *one Store *to write(sequential key) at any
time, incurring only one major compaction is more reasonable.

Why is this impossible? A store file is a dump of HBase memory blocks
written into the disk. Even if you write to a single region, single table,
with keys all close-by (even if it's all the same exact key). When the
block in memory reach a threshold, it's then written into the disk. When
more than x blocks (3 is the default) are there in disk, compaction is
launched.

   - Just more confusing is : Isn't multi-thread enabled at earlier version
   that will  allocate each compaction job to a thread , by this reason why
   there exists compaction queue waiting for processing ?

Yes, compaction is done on a separate thread, but there is one single
queue. You don't want to take 100% of you RS resources to do compactions...

Depending if you are doing mostly writes and almost no reads, you might
want to tweek some parameters. And also, you might want to look into bulk
loading...

Last, maybe you should review you key and distribution.

And last again ;) What is your table definition? Multiplying the columns
famillies can also sometime lend to this kind of issues...

JM




2013/12/2 林煒清 thesuperch...@gmail.com

 Any one knows what compaction queue size is meant?

 By doc's definition:

 *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
 queue. This is the number of stores in the region that have been targeted
 for compaction.


- Is it the *number of Store* of regionserver need to be major compacted
? or numbers of which is* being* compacted currently ?

 I have a job writing data in a hotspot style using sequential key(non
 distributed) with 1 family so that 1 Store each region.

 I was discovering that at some time it got *regionserver
 compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
 impossible since I have only *one Store *to write(sequential key) at any
 time, incurring only one major compaction is more reasonable.


- Then I dig into the logs ,found there is no thing about hints of
 queue size  0: Every major compaction just say *This selection was in
queue for 0sec, *I don't really understand what's it to means? is it
saying hbase has nothing in compaction queue?

 013-11-26 12:28:00,778 INFO
 [regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
 Completed major compaction of 3 file(s) in f1 of myTable.key.md5 into
 md5(size=607.8 M), total size for store is 645.8 M.*This selection was
 in queue for 0sec*, and took 39sec to execute.


- Just more confusing is : Isn't multi-thread enabled at earlier version
that will  allocate each compaction job to a thread , by this reason why
there exists compaction queue waiting for processing ?



Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
Hum. I need to check but I'm not sure if HBase is doing the MAX_FILESIZE
check against the compressed size of the region or against the uncompressed
size. I will guess it's against the compress size. But I will doublecheck
into the code to confirm.

Are you looking for the compressed size? Or the regular size?


2013/12/2 Vineet Mishra clearmido...@gmail.com

 Actually I am looking for the Size of the Region, and not for the whole
 table. Although the Hbase internally do the Max file size check to split
 the Region in a autonomous manner, hence there should be some way to get
 it.


 On Mon, Dec 2, 2013 at 7:51 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org
  wrote:

  Same for a single region. If it's compressed, you might want to look into
  HDFS directly...
 
 
  2013/12/2 Mike Axiak m...@axiak.net
 
   Are you looking to get the MAX_FILESIZE paramter? If so, there's
 nothing
  in
   the client, but HBaseAdmin has what you need [1].
  
  HTableDescriptor myDescriptor =
   hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
  System.out.println(my-table has a max region size of  +
   myDescriptor.getMaxFileSize());
  
  
   1:
  
  
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
  
  
   On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org
wrote:
  
Hi Vineet,
   
If you want the entire table size I don't think there is any API for
   that.
If you want the size of the table on the disk (compressed) they you
 are
better to use HDFS API.
   
JM
   
   
2013/12/2 Vineet Mishra clearmido...@gmail.com
   
 Hi

 Can Anyone tell me the Java API for getting the Region Size of a
  table!

 Thanks!

   
  
 



Re: Merging Hbase Region for a Table

2013-12-02 Thread Jean-Marc Spaggiari
Hi Vineet,

You need to put HBase off, but HDFS need to stay on if you want to be able
to access the files in it.

Since I'm using Cloudera Manager I usually just use it to stop HBase, but
from the command line you should be able to do something like
bin/stop-hbase.sh .

Also, don't do this on a production cluster.


2013/12/2 Vineet Mishra clearmido...@gmail.com

 Ok! So can you tell me in the Offline merge, how to put the cluster down?
 Should we stop the HDFS or Region Server, and if so how to achieve that?


 On Mon, Dec 2, 2013 at 7:34 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org
  wrote:

  Hi Vineet.
 
  For 0.94 you can only offline-merge.
 
  http://hbase.apache.org/book/ops.regionmgt.html#ops.regionmgt.merge
 
  JM
 
 
  2013/12/2 Vineet Mishra clearmido...@gmail.com
 
   Hi
  
   I have some 2000+ of Auto-Region created which I want to make down to a
   less number.
   I am using Hbase 0.94, is there a way I can merge the Region without
   loosing or dirtying up the data.
  
   Thanks!
  
 



Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
Hi Vineet,

So to get the size you can get the list of stores for the region and call
getStoreSizeUncompressed on each store. I'm not 100% sure this method is
accessible from outside the class. If it's not accessible, might be good to
have an easy way to get this information.

you can take a look at ConstantSizeRegionSplitPolicy,java and Store.java to
see how it's done.

I will take a deeper look when I will get off the plane ;)

JM


2013/12/2 Jean-Marc Spaggiari jean-m...@spaggiari.org

 Hum. I need to check but I'm not sure if HBase is doing the MAX_FILESIZE
 check against the compressed size of the region or against the uncompressed
 size. I will guess it's against the compress size. But I will doublecheck
 into the code to confirm.

 Are you looking for the compressed size? Or the regular size?


 2013/12/2 Vineet Mishra clearmido...@gmail.com

 Actually I am looking for the Size of the Region, and not for the whole
 table. Although the Hbase internally do the Max file size check to split
 the Region in a autonomous manner, hence there should be some way to get
 it.


 On Mon, Dec 2, 2013 at 7:51 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org
  wrote:

  Same for a single region. If it's compressed, you might want to look
 into
  HDFS directly...
 
 
  2013/12/2 Mike Axiak m...@axiak.net
 
   Are you looking to get the MAX_FILESIZE paramter? If so, there's
 nothing
  in
   the client, but HBaseAdmin has what you need [1].
  
  HTableDescriptor myDescriptor =
   hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
  System.out.println(my-table has a max region size of  +
   myDescriptor.getMaxFileSize());
  
  
   1:
  
  
 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
  
  
   On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org
wrote:
  
Hi Vineet,
   
If you want the entire table size I don't think there is any API for
   that.
If you want the size of the table on the disk (compressed) they you
 are
better to use HDFS API.
   
JM
   
   
2013/12/2 Vineet Mishra clearmido...@gmail.com
   
 Hi

 Can Anyone tell me the Java API for getting the Region Size of a
  table!

 Thanks!

   
  
 





Re: HBase ExportSnapshot

2013-12-02 Thread Ted Yu
Can you pastebin master log during operation #2 ?

There have been at least two fixes since 0.94.10, listed below.
It would be nice if you can verify this behavior using 0.94.14

Cheers

r1515967 | mbertozzi | 2013-08-20 13:49:38 -0700 (Tue, 20 Aug 2013) | 1 line

HBASE-8760 possible loss of data in snapshot taken after region split

r1507792 | mbertozzi | 2013-07-28 05:17:39 -0700 (Sun, 28 Jul 2013) | 1 line

HBASE-9060 ExportSnapshot job fails if target path contains percentage
character (Jerry He)


On Mon, Dec 2, 2013 at 9:19 AM, oc tsdb oc.t...@gmail.com wrote:

 Hi,

 We have cluster with 4 data nodes and HBase version is 0.94.10.

 We have created snapshot for all hbase tables and trying to export snapshot
 in two ways.

 option 1.Export snapshot into same cluster hdfs

  hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
  hbase_tbl_snapshot_name -copy-to   *hdfs:/hbase_backup *-mappers 16;

 Here we are getting full data ( .archive + .hbase-snapshot) exported to
 hdfs:/hbase_backup

 option 2.Export snapshot to local filesystem
 command :
 hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
  hbase_tbl_snapshot_name -copy-to   *file:///tmp/hbase_backup* -mappers 16;

 But with option 2 we only getting .hbase-snapshot exported to local dir
 (/tmp/hbase_backup) but .archive files are not exported.It is expected
 behavior or something wrong in option 2.

 Thanks
 OC



Filter - Dynamic Jar Load - FilterList not using DynamicClassLoader

2013-12-02 Thread Federico Gaule
Hi everyone,

I've tried to use dynamic jar load (
https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an
issue with FilterList.
Here is some log from my app where i send a Get with a FilterList
containing AFilter and other with BFilter.

2013-12-02 13:55:42,564 DEBUG
org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not
found - using dynamical class loader
2013-12-02 13:55:42,564 DEBUG
org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class:
d.p.AFilter
2013-12-02 13:55:42,564 DEBUG
org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar
files, if any
2013-12-02 13:55:42,677 DEBUG
org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again:
d.p.AFilter
2013-12-02 13:55:43,004 ERROR
org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class
d.p.BFilter
java.lang.ClassNotFoundException: d.p.BFilter
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594)
at 
org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324)
at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594)
at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594)
at 
org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690)
at 
org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

AFilter is not found so it tries with DynamicClassLoader, but when it
tries to load AFilter, it uses URLClassLoader and fails without
checking out for dynamic jars.


I think the issue is releated to FilterList#readFields

public void readFields(final DataInput in) throws IOException { byte opByte
= in.readByte(); operator = Operator.values()[opByte]; int size =
in.readInt(); if (size  0) { filters = new ArrayListFilter(size); for
(int i = 0; i  size; i++) { Filter filter =
(Filter)*HbaseObjectWritable.readObject(in,
conf); * filters.add(filter); } } }

*HbaseObjectWritable#readObject *uses a conf (created by calling
HBaseConfiguration.create())
which i suppose doesn't include a DynamicClassLoader instance.

Cheers


Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)

2013-12-02 Thread tsuna
On Fri, Nov 29, 2013 at 9:26 PM, Ted Yu yuzhih...@gmail.com wrote:
 In HACKING, a sample command is given:
 $ HBASE_HOME=~/src/hbase make integration ARGS='test f'

 This means the integration tests need to be run on one of the servers where
 HBase is deployed, right ?

It defaults to using localhost as the ZK quorum specification, but
you can specify whatever else you want:

$ HBASE_HOME=~/src/hbase make integration ARGS='table family
quorumSpec znodePath'

I'm thinking of cutting the 1.5 release today, please let me know if
you want a bit more time to test it before the release is cut.

-- 
Benoit tsuna Sigoure


Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)

2013-12-02 Thread Ted Yu
bq. I'm thinking of cutting the 1.5 release today

Please go ahead - my testing would focus on backward compatibility.

Cheers


On Mon, Dec 2, 2013 at 10:33 AM, tsuna tsuna...@gmail.com wrote:

 On Fri, Nov 29, 2013 at 9:26 PM, Ted Yu yuzhih...@gmail.com wrote:
  In HACKING, a sample command is given:
  $ HBASE_HOME=~/src/hbase make integration ARGS='test f'
 
  This means the integration tests need to be run on one of the servers
 where
  HBase is deployed, right ?

 It defaults to using localhost as the ZK quorum specification, but
 you can specify whatever else you want:

 $ HBASE_HOME=~/src/hbase make integration ARGS='table family
 quorumSpec znodePath'

 I'm thinking of cutting the 1.5 release today, please let me know if
 you want a bit more time to test it before the release is cut.

 --
 Benoit tsuna Sigoure



Re: Online/Realtime query with filter and join?

2013-12-02 Thread Doug Meil

You are going to want to figure out a rowkey (or a set of tables with
rowkeys) to restrict the number of I/O's. If you just slap Impala in front
of HBase (or even Phoenix, for that matter) you could write SQL against it
but if it's winds up doing a full-scan of an Hbase table underneath you
won't get your  100ms response time.

Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
saying start with the rowkeys first so that you limit the I/O.  Then start
adding frameworks as needed (and/or build a schema with Phoenix in the
same rowkey exercise).

Such response-time requirements make me think that this is for application
support, so why the requirement for SQL? Might want to start writing it as
a Java program first.









On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote:

You might want to consider something like Impala or Phoenix, I presume
you are trying to do some report query for dashboard or UI?
MapReduce is certainly not adequate as there is too much latency on
startup. If you want to give this a try, cdh4 and Impala are a good start.

Mouradk

On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote:

 The general performance requirement for each query is less than 100 ms,
 that's the average level. Sounds crazy, but yes we need to find a way
for
 it.
 
 Thanks
 Ramon
 
 
 On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote:
 
 The question is what you mean of real-time. What is your performance
 request? In my opinion, I don't think the MapReduce is suitable for the
 real time data processing.
 
 
 On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote:
 
 you can try phoniex.
 On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote:
 
 Hi Folks
 
 It seems to be impossible, but I still want to check if there is a
way
 we
 can do complex query on HBase with Order By, JOIN.. etc like we
 have
 with normal RDBMS, we are asked to provided such a solution for it,
any
 ideas? Thanks for your help.
 
 BTW, i think maybe impala from CDH would be a way to go, but haven't
 got
 time to check it yet.
 
 Thanks
 Ramon
 



Re: Online/Realtime query with filter and join?

2013-12-02 Thread Pradeep Gollakota
In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
mix. :)

http://prestodb.io/


On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.comwrote:


 You are going to want to figure out a rowkey (or a set of tables with
 rowkeys) to restrict the number of I/O's. If you just slap Impala in front
 of HBase (or even Phoenix, for that matter) you could write SQL against it
 but if it's winds up doing a full-scan of an Hbase table underneath you
 won't get your  100ms response time.

 Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
 saying start with the rowkeys first so that you limit the I/O.  Then start
 adding frameworks as needed (and/or build a schema with Phoenix in the
 same rowkey exercise).

 Such response-time requirements make me think that this is for application
 support, so why the requirement for SQL? Might want to start writing it as
 a Java program first.









 On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote:

 You might want to consider something like Impala or Phoenix, I presume
 you are trying to do some report query for dashboard or UI?
 MapReduce is certainly not adequate as there is too much latency on
 startup. If you want to give this a try, cdh4 and Impala are a good start.
 
 Mouradk
 
 On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote:
 
  The general performance requirement for each query is less than 100 ms,
  that's the average level. Sounds crazy, but yes we need to find a way
 for
  it.
 
  Thanks
  Ramon
 
 
  On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote:
 
  The question is what you mean of real-time. What is your performance
  request? In my opinion, I don't think the MapReduce is suitable for the
  real time data processing.
 
 
  On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote:
 
  you can try phoniex.
  On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote:
 
  Hi Folks
 
  It seems to be impossible, but I still want to check if there is a
 way
  we
  can do complex query on HBase with Order By, JOIN.. etc like we
  have
  with normal RDBMS, we are asked to provided such a solution for it,
 any
  ideas? Thanks for your help.
 
  BTW, i think maybe impala from CDH would be a way to go, but haven't
  got
  time to check it yet.
 
  Thanks
  Ramon
 




Re: Online/Realtime query with filter and join?

2013-12-02 Thread Viral Bajaria
Pradeep, correct me if I am wrong but prestodb has not released the HBase
plugin as yet or they did and maybe I missed the announcement ?

I agree with what Doug is saying here, you can't achieve  100ms on every
kind of query on HBase unless and until you design the rowkey in a way to
help you reduce your I/O. A full scan of a table with billions of rows and
columns can take forever, but good indexing (via rowkey or secondary
indexes) could help speed up.

Thanks,
Viral


On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.comwrote:

 In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
 mix. :)

 http://prestodb.io/


 On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.com
 wrote:

 
  You are going to want to figure out a rowkey (or a set of tables with
  rowkeys) to restrict the number of I/O's. If you just slap Impala in
 front
  of HBase (or even Phoenix, for that matter) you could write SQL against
 it
  but if it's winds up doing a full-scan of an Hbase table underneath you
  won't get your  100ms response time.
 
  Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
  saying start with the rowkeys first so that you limit the I/O.  Then
 start
  adding frameworks as needed (and/or build a schema with Phoenix in the
  same rowkey exercise).
 
  Such response-time requirements make me think that this is for
 application
  support, so why the requirement for SQL? Might want to start writing it
 as
  a Java program first.
 
 
 
 
 
 
 
 
 
  On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote:
 
  You might want to consider something like Impala or Phoenix, I presume
  you are trying to do some report query for dashboard or UI?
  MapReduce is certainly not adequate as there is too much latency on
  startup. If you want to give this a try, cdh4 and Impala are a good
 start.
  
  Mouradk
  
  On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote:
  
   The general performance requirement for each query is less than 100
 ms,
   that's the average level. Sounds crazy, but yes we need to find a way
  for
   it.
  
   Thanks
   Ramon
  
  
   On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com
 wrote:
  
   The question is what you mean of real-time. What is your
 performance
   request? In my opinion, I don't think the MapReduce is suitable for
 the
   real time data processing.
  
  
   On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com
 wrote:
  
   you can try phoniex.
   On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote:
  
   Hi Folks
  
   It seems to be impossible, but I still want to check if there is a
  way
   we
   can do complex query on HBase with Order By, JOIN.. etc like
 we
   have
   with normal RDBMS, we are asked to provided such a solution for it,
  any
   ideas? Thanks for your help.
  
   BTW, i think maybe impala from CDH would be a way to go, but
 haven't
   got
   time to check it yet.
  
   Thanks
   Ramon
  
 
 



Re: Online/Realtime query with filter and join?

2013-12-02 Thread James Taylor
I agree with Doug Meil's advice. Start with your row key design. In
Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead
with the columns that you'll filter against most frequently. Then, take a
look at adding secondary indexes to speedup queries against other columns.

Thanks,
James


On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.comwrote:

 In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
 mix. :)

 http://prestodb.io/


 On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.com
 wrote:

 
  You are going to want to figure out a rowkey (or a set of tables with
  rowkeys) to restrict the number of I/O's. If you just slap Impala in
 front
  of HBase (or even Phoenix, for that matter) you could write SQL against
 it
  but if it's winds up doing a full-scan of an Hbase table underneath you
  won't get your  100ms response time.
 
  Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
  saying start with the rowkeys first so that you limit the I/O.  Then
 start
  adding frameworks as needed (and/or build a schema with Phoenix in the
  same rowkey exercise).
 
  Such response-time requirements make me think that this is for
 application
  support, so why the requirement for SQL? Might want to start writing it
 as
  a Java program first.
 
 
 
 
 
 
 
 
 
  On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote:
 
  You might want to consider something like Impala or Phoenix, I presume
  you are trying to do some report query for dashboard or UI?
  MapReduce is certainly not adequate as there is too much latency on
  startup. If you want to give this a try, cdh4 and Impala are a good
 start.
  
  Mouradk
  
  On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote:
  
   The general performance requirement for each query is less than 100
 ms,
   that's the average level. Sounds crazy, but yes we need to find a way
  for
   it.
  
   Thanks
   Ramon
  
  
   On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com
 wrote:
  
   The question is what you mean of real-time. What is your
 performance
   request? In my opinion, I don't think the MapReduce is suitable for
 the
   real time data processing.
  
  
   On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com
 wrote:
  
   you can try phoniex.
   On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote:
  
   Hi Folks
  
   It seems to be impossible, but I still want to check if there is a
  way
   we
   can do complex query on HBase with Order By, JOIN.. etc like
 we
   have
   with normal RDBMS, we are asked to provided such a solution for it,
  any
   ideas? Thanks for your help.
  
   BTW, i think maybe impala from CDH would be a way to go, but
 haven't
   got
   time to check it yet.
  
   Thanks
   Ramon
  
 
 



Re: Online/Realtime query with filter and join?

2013-12-02 Thread Pradeep Gollakota
@Viral I'm not sure... I just know that they mentioned on the front page
that PrestoDB can query HBase tables.


On Mon, Dec 2, 2013 at 11:07 AM, James Taylor jtay...@salesforce.comwrote:

 I agree with Doug Meil's advice. Start with your row key design. In
 Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead
 with the columns that you'll filter against most frequently. Then, take a
 look at adding secondary indexes to speedup queries against other columns.

 Thanks,
 James


 On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.com
 wrote:

  In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
  mix. :)
 
  http://prestodb.io/
 
 
  On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil 
 doug.m...@explorysmedical.com
  wrote:
 
  
   You are going to want to figure out a rowkey (or a set of tables with
   rowkeys) to restrict the number of I/O's. If you just slap Impala in
  front
   of HBase (or even Phoenix, for that matter) you could write SQL against
  it
   but if it's winds up doing a full-scan of an Hbase table underneath you
   won't get your  100ms response time.
  
   Note:  I'm not saying you can't do this with Impala or Phoenix, I'm
 just
   saying start with the rowkeys first so that you limit the I/O.  Then
  start
   adding frameworks as needed (and/or build a schema with Phoenix in the
   same rowkey exercise).
  
   Such response-time requirements make me think that this is for
  application
   support, so why the requirement for SQL? Might want to start writing it
  as
   a Java program first.
  
  
  
  
  
  
  
  
  
   On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote:
  
   You might want to consider something like Impala or Phoenix, I presume
   you are trying to do some report query for dashboard or UI?
   MapReduce is certainly not adequate as there is too much latency on
   startup. If you want to give this a try, cdh4 and Impala are a good
  start.
   
   Mouradk
   
   On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote:
   
The general performance requirement for each query is less than 100
  ms,
that's the average level. Sounds crazy, but yes we need to find a
 way
   for
it.
   
Thanks
Ramon
   
   
On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com
  wrote:
   
The question is what you mean of real-time. What is your
  performance
request? In my opinion, I don't think the MapReduce is suitable for
  the
real time data processing.
   
   
On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com
  wrote:
   
you can try phoniex.
On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote:
   
Hi Folks
   
It seems to be impossible, but I still want to check if there is
 a
   way
we
can do complex query on HBase with Order By, JOIN.. etc
 like
  we
have
with normal RDBMS, we are asked to provided such a solution for
 it,
   any
ideas? Thanks for your help.
   
BTW, i think maybe impala from CDH would be a way to go, but
  haven't
got
time to check it yet.
   
Thanks
Ramon
   
  
  
 



Consequent deletes more than ~256 rows not working

2013-12-02 Thread Mrudula Madiraju
Hi,
 
I have a simple hbase table of approx. 1000 rows. 
If I invoke the htable.delete() on this table in a while loop - it doesn't 
throw any error or exception.
But at the end of operation - I see that it has actually deleted only about 256 
rows.
Repeating the operation deletes another 256 or so. And finally after 3 or 4 
runs , all rows finally get deleted.
 
This is true of htable.batch API or even htable.delete API.
 
Have tried changing ulimit/nproc settings, invoking flush, setting autocommit, 
invoking major Compact 
Also waited out - That is to see if after 5 minutes of first run delete will 
complete in background. Nothing works.

Searched the mailing list and see that there are threads on delete followed by 
put not working etc. 
But this is a different case. 

Anyone who knows what's happening? Looking forward to any pointers! 

Regards,
Mrudula

Re: Consequent deletes more than ~256 rows not working

2013-12-02 Thread Ted Yu
Which HBase release are you using ?

In your while loop, you used the same set of row keys for each attempt ?

Thanks


On Sun, Dec 1, 2013 at 11:28 PM, Mrudula Madiraju mrudulamadir...@yahoo.com
 wrote:

 Hi,

 I have a simple hbase table of approx. 1000 rows.
 If I invoke the htable.delete() on this table in a while loop - it doesn't
 throw any error or exception.
 But at the end of operation - I see that it has actually deleted only
 about 256 rows.
 Repeating the operation deletes another 256 or so. And finally after 3 or
 4 runs , all rows finally get deleted.

 This is true of htable.batch API or even htable.delete API.

 Have tried changing ulimit/nproc settings, invoking flush, setting
 autocommit, invoking major Compact
 Also waited out - That is to see if after 5 minutes of first run delete
 will complete in background. Nothing works.

 Searched the mailing list and see that there are threads on delete
 followed by put not working etc.
 But this is a different case.

 Anyone who knows what's happening? Looking forward to any pointers!

 Regards,
 Mrudula


Makes search indexes

2013-12-02 Thread James Pettyjohn

Hi, general strategy and schemata approach question.

I've got a lot of different data in a relational db I'm trying to make
searchable. One thing for example is searching for people by email
address. I have 6 tables that might be, 10s of millions of records
and none of it standardized. So it's mixed case and may have multiple
emails in one field or something which isn't an email address at all.

To do that as a one off isn't too bad but the data will be added to,
and PKs will get phased out and split into multiple PKs etc. Also I
want this on a number of other fields too that will need different
transformations applied to the data and come from their own set of
tables.

I could do this a number of ways but I'm not satisfied with any of them
and I don't think that such a generic proposition has no tools already
somewhat suited for this task.

The best tools for this may not be HBase but I'd like to
put my HBase cluster to work on this and have it available to
MR jobs.

Best, James


Re: Consequent deletes more than ~256 rows not working

2013-12-02 Thread Azuryy Yu
It would be better paste a piece of your code.


On Tue, Dec 3, 2013 at 7:04 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which HBase release are you using ?

 In your while loop, you used the same set of row keys for each attempt ?

 Thanks


 On Sun, Dec 1, 2013 at 11:28 PM, Mrudula Madiraju 
 mrudulamadir...@yahoo.com
  wrote:

  Hi,
 
  I have a simple hbase table of approx. 1000 rows.
  If I invoke the htable.delete() on this table in a while loop - it
 doesn't
  throw any error or exception.
  But at the end of operation - I see that it has actually deleted only
  about 256 rows.
  Repeating the operation deletes another 256 or so. And finally after 3 or
  4 runs , all rows finally get deleted.
 
  This is true of htable.batch API or even htable.delete API.
 
  Have tried changing ulimit/nproc settings, invoking flush, setting
  autocommit, invoking major Compact
  Also waited out - That is to see if after 5 minutes of first run delete
  will complete in background. Nothing works.
 
  Searched the mailing list and see that there are threads on delete
  followed by put not working etc.
  But this is a different case.
 
  Anyone who knows what's happening? Looking forward to any pointers!
 
  Regards,
  Mrudula



Re: HBase ExportSnapshot

2013-12-02 Thread oc tsdb
We see same logs for both options

013-12-02 09:47:41,311 INFO
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Running FLUSH
table snapshot tsdb_snap_backup C_M_SNAPSHOT_TABLE on table tsdb
2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils:
FileSystem doesn't support getDefaultReplication
2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils:
FileSystem doesn't support getDefaultBlockSize
2013-12-02 09:47:41,337 INFO org.apache.hadoop.hbase.procedure.Procedure:
Starting procedure 'tsdb_snap_backup'
2013-12-02 09:47:41,724 INFO org.apache.hadoop.hbase.procedure.Procedure:
Procedure 'tsdb_snap_backup' execution completed
2013-12-02 09:47:41,724 INFO
org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all znodes for
procedure tsdb_snap_backupincluding nodes /hbase/online-snapshot/acquired
/hbase/online-snapshot/reached /hbase/online-snapshot/abort
2013-12-02 09:47:41,730 INFO
org.apache.hadoop.hbase.master.snapshot.EnabledTableSnapshotHandler: Done
waiting - snapshot for tsdb_snap_backup finished!

It seems we can't export complete snapshot data directly to local file
system using 'ExportSnapshot' command.
If we want to copy to outside of cluster first we need to export it to hdfs
and then use hadoop get command to copy to local file system.
Is this correct?

What is the difference between below two commands?
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
 hbase_tbl_snapshot_name -copy-to   file:///tmp/hbase_backup -mappers 16;

 hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
 hbase_tbl_snapshot_name -copy-to   hdfs:/hbase_backup -mappers 16;

Thanks
-OC



On Mon, Dec 2, 2013 at 10:56 PM, Ted Yu yuzhih...@gmail.com wrote:

 Can you pastebin master log during operation #2 ?

 There have been at least two fixes since 0.94.10, listed below.
 It would be nice if you can verify this behavior using 0.94.14

 Cheers

 r1515967 | mbertozzi | 2013-08-20 13:49:38 -0700 (Tue, 20 Aug 2013) | 1
 line

 HBASE-8760 possible loss of data in snapshot taken after region split
 
 r1507792 | mbertozzi | 2013-07-28 05:17:39 -0700 (Sun, 28 Jul 2013) | 1
 line

 HBASE-9060 ExportSnapshot job fails if target path contains percentage
 character (Jerry He)


 On Mon, Dec 2, 2013 at 9:19 AM, oc tsdb oc.t...@gmail.com wrote:

  Hi,
 
  We have cluster with 4 data nodes and HBase version is 0.94.10.
 
  We have created snapshot for all hbase tables and trying to export
 snapshot
  in two ways.
 
  option 1.Export snapshot into same cluster hdfs
 
   hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
   hbase_tbl_snapshot_name -copy-to   *hdfs:/hbase_backup *-mappers 16;
 
  Here we are getting full data ( .archive + .hbase-snapshot) exported to
  hdfs:/hbase_backup
 
  option 2.Export snapshot to local filesystem
  command :
  hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
   hbase_tbl_snapshot_name -copy-to   *file:///tmp/hbase_backup* -mappers
 16;
 
  But with option 2 we only getting .hbase-snapshot exported to local dir
  (/tmp/hbase_backup) but .archive files are not exported.It is expected
  behavior or something wrong in option 2.
 
  Thanks
  OC
 



Re: HBase ExportSnapshot

2013-12-02 Thread Ted Yu
The log you pasted was for taking snapshot.
Do you have log from ExportSnapshot ?

bq. What is the difference between below two commands?

This is the code that determines output FileSystem:

FileSystem outputFs = FileSystem.get(outputRoot.toUri(), conf);

For 'file:///tmp/hbase_backup' argument, outputFs would be an instance of
org.apache.hadoop.fs.LocalFileSystem.

Cheers


On Mon, Dec 2, 2013 at 9:06 PM, oc tsdb oc.t...@gmail.com wrote:

 We see same logs for both options

 013-12-02 09:47:41,311 INFO
 org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Running FLUSH
 table snapshot tsdb_snap_backup C_M_SNAPSHOT_TABLE on table tsdb
 2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils:
 FileSystem doesn't support getDefaultReplication
 2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils:
 FileSystem doesn't support getDefaultBlockSize
 2013-12-02 09:47:41,337 INFO org.apache.hadoop.hbase.procedure.Procedure:
 Starting procedure 'tsdb_snap_backup'
 2013-12-02 09:47:41,724 INFO org.apache.hadoop.hbase.procedure.Procedure:
 Procedure 'tsdb_snap_backup' execution completed
 2013-12-02 09:47:41,724 INFO
 org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all znodes for
 procedure tsdb_snap_backupincluding nodes /hbase/online-snapshot/acquired
 /hbase/online-snapshot/reached /hbase/online-snapshot/abort
 2013-12-02 09:47:41,730 INFO
 org.apache.hadoop.hbase.master.snapshot.EnabledTableSnapshotHandler: Done
 waiting - snapshot for tsdb_snap_backup finished!

 It seems we can't export complete snapshot data directly to local file
 system using 'ExportSnapshot' command.
 If we want to copy to outside of cluster first we need to export it to hdfs
 and then use hadoop get command to copy to local file system.
 Is this correct?

 What is the difference between below two commands?
 hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
  hbase_tbl_snapshot_name -copy-to   file:///tmp/hbase_backup -mappers 16;

  hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
  hbase_tbl_snapshot_name -copy-to   hdfs:/hbase_backup -mappers 16;

 Thanks
 -OC



 On Mon, Dec 2, 2013 at 10:56 PM, Ted Yu yuzhih...@gmail.com wrote:

  Can you pastebin master log during operation #2 ?
 
  There have been at least two fixes since 0.94.10, listed below.
  It would be nice if you can verify this behavior using 0.94.14
 
  Cheers
 
  r1515967 | mbertozzi | 2013-08-20 13:49:38 -0700 (Tue, 20 Aug 2013) | 1
  line
 
  HBASE-8760 possible loss of data in snapshot taken after region split
  
  r1507792 | mbertozzi | 2013-07-28 05:17:39 -0700 (Sun, 28 Jul 2013) | 1
  line
 
  HBASE-9060 ExportSnapshot job fails if target path contains percentage
  character (Jerry He)
 
 
  On Mon, Dec 2, 2013 at 9:19 AM, oc tsdb oc.t...@gmail.com wrote:
 
   Hi,
  
   We have cluster with 4 data nodes and HBase version is 0.94.10.
  
   We have created snapshot for all hbase tables and trying to export
  snapshot
   in two ways.
  
   option 1.Export snapshot into same cluster hdfs
  
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
hbase_tbl_snapshot_name -copy-to   *hdfs:/hbase_backup *-mappers 16;
  
   Here we are getting full data ( .archive + .hbase-snapshot) exported to
   hdfs:/hbase_backup
  
   option 2.Export snapshot to local filesystem
   command :
   hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
hbase_tbl_snapshot_name -copy-to   *file:///tmp/hbase_backup* -mappers
  16;
  
   But with option 2 we only getting .hbase-snapshot exported to local dir
   (/tmp/hbase_backup) but .archive files are not exported.It is expected
   behavior or something wrong in option 2.
  
   Thanks
   OC
  
 



Re: Hbase Region Size

2013-12-02 Thread Asaf Mesika
In this method, you can get the region's Load per region:

private MapString, RegionLoad getRegionsLoad() {
try {
MapString, RegionLoad regionsNameToLoad = new HashMapString,
RegionLoad();
ClusterStatus clusterStatus = hAdmin.getClusterStatus();
for (ServerName serverName : clusterStatus.getServers()) {
HServerLoad load = clusterStatus.getLoad(serverName);
Mapbyte[], RegionLoad regionsLoad = load.getRegionsLoad();
for (Map.Entrybyte[], RegionLoad entry :
regionsLoad.entrySet()) {
RegionLoad regionLoad = entry.getValue();
regionsNameToLoad.put(regionLoad.getNameAsString(),
regionLoad);
}
}
return regionsNameToLoad;
} catch (IOException e1) {
throw new RuntimeException(Failed while fetching cluster load:
+e1.getMessage(), e1);
}
}

Then you get use it by:

regionLoad.getStorefileSizeMB()

and there are other methds on RegionLoad.

Check it out.


Asaf


On Mon, Dec 2, 2013 at 3:48 PM, Vineet Mishra clearmido...@gmail.comwrote:

 Hi

 Can Anyone tell me the Java API for getting the Region Size of a table!

 Thanks!



Re: HBase ExportSnapshot

2013-12-02 Thread oc tsdb
here is snapshot export logs.

mastre log:
===
2013-12-02 21:54:30,840 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=1 regions=1
average=1.0 mostloaded=1 leastloaded=1
2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=1 regions=1
average=1.0 mostloaded=1 leastloaded=1
2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=1 regions=1
average=1.0 mostloaded=1 leastloaded=1
2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=1 regions=1
average=1.0 mostloaded=1 leastloaded=1

snapshot export console log:
=

2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=1 regions=1
average=1.0 mostloaded=1 leastloaded=1
2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer:
Skipping load balancing because balanced cluster; servers=1 regions=1
average=1.0 mostloaded=1 leastloaded=1

at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitReferencedFiles(SnapshotReferenceUtil.java:101)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:385)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:633)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:705)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:709)
13/12/02 21:54:24 INFO util.FSVisitor: No families under region
directory:hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-meta_snap_backup/f06335933b32019c4369f95001d996fb
13/12/02 21:54:24 INFO util.FSVisitor: No logs under directory:hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-meta_snap_backup/.logs
13/12/02 21:54:24 WARN snapshot.ExportSnapshot: There are 0 store file to
be copied. There may be no data in the table.
13/12/02 21:54:25 INFO util.FSVisitor: No families under region
directory:hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-tree_snap_backup/c40c34c4312ccb3302fbaf62caa91b9c
13/12/02 21:54:25 INFO util.FSVisitor: No logs under directory:hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-tree_snap_backup/.logs
13/12/02 21:54:25 WARN snapshot.ExportSnapshot: There are 0 store file to
be copied. There may be no data in the table.
Exception in thread main java.io.FileNotFoundException: Unable to open
link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/tsdb-uid/f9e5e554f111dc0679dfc8069b282ff7/id/ed071cd010534856adc4be997498d645,
hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/.tmp/tsdb-uid/f9e5e554f111dc0679dfc8069b282ff7/id/ed071cd010534856adc4be997498d645,
hdfs://
site.com:54310/data_full_backup_2013-12-02_21.49.20/.archive/tsdb-uid/f9e5e554f111dc0679dfc8069b282ff7/id/ed071cd010534856adc4be997498d645
]
at
org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:376)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot$1.storeFile(ExportSnapshot.java:390)
at
org.apache.hadoop.hbase.util.FSVisitor.visitRegionStoreFiles(FSVisitor.java:115)
at
org.apache.hadoop.hbase.util.FSVisitor.visitTableStoreFiles(FSVisitor.java:81)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:116)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitReferencedFiles(SnapshotReferenceUtil.java:101)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:385)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:633)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:705)
at
org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:709)

Basically while exporting to local i dont see .archive directory.why?

Please comment on this -
It seems we can't export complete snapshot data directly to local file
system using 'ExportSnapshot' command.
If we want to copy to outside of cluster first we need to export it to hdfs
and then use hadoop get command to copy to local file system.
Is this correct?

Thanks
-OC


On Tue, Dec 3, 2013 at 11:09 AM, Ted Yu yuzhih...@gmail.com wrote:

 The log you pasted was for taking snapshot.
 Do you have log from ExportSnapshot ?

 bq. What is the difference between below two commands?

 This is the code that determines output FileSystem:

 

Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)

2013-12-02 Thread Viral Bajaria
Depends on what version of HBase you are using. If you are using HBase
0.95+, with 1.5.0 asynchbase you will be able to use any filters that exist
in HBase. Though you might need to add the required classes since
asynchbase needs them to serialize to the old RPC protocol for  0.95 and
uses protobufs for versions  0.95.

Thanks,
Viral


On Mon, Dec 2, 2013 at 9:04 PM, ripsacCTO ankur...@gmail.com wrote:

 Hi Tsuna,

 Just wanted to know , if there is any way of Scanning using column value
 and with filters on column value.

 Eg. Suppose I have column values against qualifier waterlevel as
 40,20,30,10 etc.

 *If my requirement is like that I want to fetch data with requirement like
 give me data with water level  20.*

 Is such a query possible. Please suggest some examples.

 Does the latest version of asyncbase support the above ? If yes , please
 suggest with some example code base.

 Thanks in advance.


 On Tuesday, October 29, 2013 10:27:13 AM UTC+5:30, tsuna wrote:

 Hi all,
 The first release candidate of AsyncHBase post-singularity is now
 available for download.  AsyncHBase remains true to its initial
 promise: the API is still backward compatible, but under the hood it
 continues to work with all production releases of HBase of the past
 few years.

 This release was tested against HBase 0.89, 0.90, 0.92, 0.94, and
 0.96.  While 0.2x probably still works, I didn't take the time to test
 it, because… well, really, you shouldn't be using such ancient version
 of HBase.  Really.

 The Maven build was broken by the addition of protobufs in the build
 process.  Any Maven fans out there who would like to help fix it?
 Without it I can't easily publish new artifacts to Maven repo.


 Here is the relevant excerpt of the NEWS file:

 This release introduces compatibility with HBase 0.96 and up, and adds
 a dependency on Google's protobuf-java library.  Note that HBase 0.95.x,
 which was a developer preview release train, is NOT supported.

 Please note that support for explicit row locks has been removed from
 HBase 0.95 and up.  While the classes and functionality remain usable
 when using earlier versions of HBase, an `UnsupportedOperationException'
 will be raised if one attempt to send a `RowLockRequest' to a newer
 version of HBase.

 Please note that while AsyncHBase never made any guarantees about the
 exact order in which multiple edits are applied within a batch, the order
 is now different when talking to HBase 0.96 and up.

 New public APIs:
   - Scanners can now use a variety of different filters via the new
 `ScanFilter' interfaces and its various implementations.
   - It's possible to specify specific families to scan via `setFamilies'.
   - Scanners can put an upper bound on the amount of data fetched by RPC
 via the new `setMaxNumKeyValues' (works with HBase 0.96 and up only).
   - HBaseRpc now has a `failfast()' and a `setFailfast(boolean)' pair
 of methods to allow RPCs to fail as soon as their encounter an
 issue out of the ordinary (e.g. not just a `NotSuchRegionException').
   - `GetRequest' has additional constructor overloads that make its API
 more uniform with that of other RPCs.

 Noteworthy bug fixes:
   - DeleteRequest wasn't honoring its timestamp if one was given (#58).
   - When a connection attempt fails, buffered RPCs weren't cleaned up
 or retried properly.
   - When one RPC fails because of another one (e.g. we fail to send an
 RPC because a META lookup failed), the asynchronous exception that
 is given to the callback now properly carries the original RPC that
 failed.
   - There was an unlikely race condition that could cause an NPE while
 trying to retrieve the ROOT region from ZooKeeper.


 Pre-compiled JAR: http://tsunanet.net/~tsuna/asynchbase/asynchbase-1.5.0-
 rc1.jar
 Source: https://github.com/tsuna/asynchbase
 Javadoc: http://tsunanet.net/~tsuna/asynchbase/1.5.0/org/hbase/
 async/HBaseClient.html

 $ git diff --stat v1.4.1.. | tail -n 1
  70 files changed, 4824 insertions(+), 487 deletions(-)

 $ git shortlog v1.4.1..
 Andrey Stepachev (1):
   Add support for multiple families/qualifiers in scanners.

 Benoit Sigoure (65):
   Start v1.5.0.
   Add Viral to AUTHORS for his work on ScanFilter.
   Document ScanFilter and prevent it from being subclassed
 externally.
   Convert the regexp key filtering mechanism to the ScanFilter.
   Document how to run integration tests.
   Enhance filters a bit and add integration tests.
   Add a new helper function to produce better errors during tests.
   Mention new scanner filters in NEWS.
   Allow RPCs to fail-fast.
   Update NEWS / THANKS.
   Update suasync to 1.3.2.
   Properly clean up when connection fails before being opened.
   Properly report which RPC has failed in HasFailedRpcException.
   Fix a small race condition when looking up the ROOT region.
   Add HBase protocol buffers to the compilation process.