Incremental backup of hbase with export not working
Hi , In order to take incremental backup using export of hbase , we followed http://hbase.apache.org/book/ops_mgt.html#import Few things that need clarification are : 1. what does version mean? is is the same version number which we give during creation of hbase table ? 2. what if we don't specify the version and just specify start and end time stamps ? kindly provide us an example how to take incremental hbase backup using exportin an interval. We did some experiments with version and start time combination and results are as follows 1. we created a table with version=1 and tested the import CLI using the same version (version =1 ) and start-end times .even though the data is present between start and end intervals , we didn't get any data. 2. without specifying the version , we got all the data irrespective of the start and end times Kindly clarify us how to specify the version , time stamp range to match our requirements . Thanks, Oc.tsdb
Hbase Region Size
Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Merging Hbase Region for a Table
Hi I have some 2000+ of Auto-Region created which I want to make down to a less number. I am using Hbase 0.94, is there a way I can merge the Region without loosing or dirtying up the data. Thanks!
Re: Merging Hbase Region for a Table
Hi Vineet. For 0.94 you can only offline-merge. http://hbase.apache.org/book/ops.regionmgt.html#ops.regionmgt.merge JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi I have some 2000+ of Auto-Region created which I want to make down to a less number. I am using Hbase 0.94, is there a way I can merge the Region without loosing or dirtying up the data. Thanks!
Re: Hbase Region Size
Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: Hbase Region Size
Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: Hbase Region Size
Same for a single region. If it's compressed, you might want to look into HDFS directly... 2013/12/2 Mike Axiak m...@axiak.net Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: Hbase Region Size
Actually I am looking for the Size of the Region, and not for the whole table. Although the Hbase internally do the Max file size check to split the Region in a autonomous manner, hence there should be some way to get it. On Mon, Dec 2, 2013 at 7:51 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Same for a single region. If it's compressed, you might want to look into HDFS directly... 2013/12/2 Mike Axiak m...@axiak.net Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: What is HBase compaction-queue-size at all?
- Is it the *number of Store* of regionserver need to be major compacted ? or numbers of which is* being* compacted currently ? This is the number that are currently in the pipe. Doesn.'t mean they are compacting right now, but they are queued for compaction. and not necessary major compaction. Major is only if all the regions need to compact. I was discovering that at some time it got *regionserver compaction-queue-size = 4*.(I check it from Ambari). That's theoretically impossible since I have only *one Store *to write(sequential key) at any time, incurring only one major compaction is more reasonable. Why is this impossible? A store file is a dump of HBase memory blocks written into the disk. Even if you write to a single region, single table, with keys all close-by (even if it's all the same exact key). When the block in memory reach a threshold, it's then written into the disk. When more than x blocks (3 is the default) are there in disk, compaction is launched. - Just more confusing is : Isn't multi-thread enabled at earlier version that will allocate each compaction job to a thread , by this reason why there exists compaction queue waiting for processing ? Yes, compaction is done on a separate thread, but there is one single queue. You don't want to take 100% of you RS resources to do compactions... Depending if you are doing mostly writes and almost no reads, you might want to tweek some parameters. And also, you might want to look into bulk loading... Last, maybe you should review you key and distribution. And last again ;) What is your table definition? Multiplying the columns famillies can also sometime lend to this kind of issues... JM 2013/12/2 林煒清 thesuperch...@gmail.com Any one knows what compaction queue size is meant? By doc's definition: *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction queue. This is the number of stores in the region that have been targeted for compaction. - Is it the *number of Store* of regionserver need to be major compacted ? or numbers of which is* being* compacted currently ? I have a job writing data in a hotspot style using sequential key(non distributed) with 1 family so that 1 Store each region. I was discovering that at some time it got *regionserver compaction-queue-size = 4*.(I check it from Ambari). That's theoretically impossible since I have only *one Store *to write(sequential key) at any time, incurring only one major compaction is more reasonable. - Then I dig into the logs ,found there is no thing about hints of queue size 0: Every major compaction just say *This selection was in queue for 0sec, *I don't really understand what's it to means? is it saying hbase has nothing in compaction queue? 013-11-26 12:28:00,778 INFO [regionserver60020-smallCompactions-1385440028938] regionserver.HStore: Completed major compaction of 3 file(s) in f1 of myTable.key.md5 into md5(size=607.8 M), total size for store is 645.8 M.*This selection was in queue for 0sec*, and took 39sec to execute. - Just more confusing is : Isn't multi-thread enabled at earlier version that will allocate each compaction job to a thread , by this reason why there exists compaction queue waiting for processing ?
Re: Hbase Region Size
Hum. I need to check but I'm not sure if HBase is doing the MAX_FILESIZE check against the compressed size of the region or against the uncompressed size. I will guess it's against the compress size. But I will doublecheck into the code to confirm. Are you looking for the compressed size? Or the regular size? 2013/12/2 Vineet Mishra clearmido...@gmail.com Actually I am looking for the Size of the Region, and not for the whole table. Although the Hbase internally do the Max file size check to split the Region in a autonomous manner, hence there should be some way to get it. On Mon, Dec 2, 2013 at 7:51 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Same for a single region. If it's compressed, you might want to look into HDFS directly... 2013/12/2 Mike Axiak m...@axiak.net Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: Merging Hbase Region for a Table
Hi Vineet, You need to put HBase off, but HDFS need to stay on if you want to be able to access the files in it. Since I'm using Cloudera Manager I usually just use it to stop HBase, but from the command line you should be able to do something like bin/stop-hbase.sh . Also, don't do this on a production cluster. 2013/12/2 Vineet Mishra clearmido...@gmail.com Ok! So can you tell me in the Offline merge, how to put the cluster down? Should we stop the HDFS or Region Server, and if so how to achieve that? On Mon, Dec 2, 2013 at 7:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Vineet. For 0.94 you can only offline-merge. http://hbase.apache.org/book/ops.regionmgt.html#ops.regionmgt.merge JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi I have some 2000+ of Auto-Region created which I want to make down to a less number. I am using Hbase 0.94, is there a way I can merge the Region without loosing or dirtying up the data. Thanks!
Re: Hbase Region Size
Hi Vineet, So to get the size you can get the list of stores for the region and call getStoreSizeUncompressed on each store. I'm not 100% sure this method is accessible from outside the class. If it's not accessible, might be good to have an easy way to get this information. you can take a look at ConstantSizeRegionSplitPolicy,java and Store.java to see how it's done. I will take a deeper look when I will get off the plane ;) JM 2013/12/2 Jean-Marc Spaggiari jean-m...@spaggiari.org Hum. I need to check but I'm not sure if HBase is doing the MAX_FILESIZE check against the compressed size of the region or against the uncompressed size. I will guess it's against the compress size. But I will doublecheck into the code to confirm. Are you looking for the compressed size? Or the regular size? 2013/12/2 Vineet Mishra clearmido...@gmail.com Actually I am looking for the Size of the Region, and not for the whole table. Although the Hbase internally do the Max file size check to split the Region in a autonomous manner, hence there should be some way to get it. On Mon, Dec 2, 2013 at 7:51 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Same for a single region. If it's compressed, you might want to look into HDFS directly... 2013/12/2 Mike Axiak m...@axiak.net Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: HBase ExportSnapshot
Can you pastebin master log during operation #2 ? There have been at least two fixes since 0.94.10, listed below. It would be nice if you can verify this behavior using 0.94.14 Cheers r1515967 | mbertozzi | 2013-08-20 13:49:38 -0700 (Tue, 20 Aug 2013) | 1 line HBASE-8760 possible loss of data in snapshot taken after region split r1507792 | mbertozzi | 2013-07-28 05:17:39 -0700 (Sun, 28 Jul 2013) | 1 line HBASE-9060 ExportSnapshot job fails if target path contains percentage character (Jerry He) On Mon, Dec 2, 2013 at 9:19 AM, oc tsdb oc.t...@gmail.com wrote: Hi, We have cluster with 4 data nodes and HBase version is 0.94.10. We have created snapshot for all hbase tables and trying to export snapshot in two ways. option 1.Export snapshot into same cluster hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to *hdfs:/hbase_backup *-mappers 16; Here we are getting full data ( .archive + .hbase-snapshot) exported to hdfs:/hbase_backup option 2.Export snapshot to local filesystem command : hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to *file:///tmp/hbase_backup* -mappers 16; But with option 2 we only getting .hbase-snapshot exported to local dir (/tmp/hbase_backup) but .archive files are not exported.It is expected behavior or something wrong in option 2. Thanks OC
Filter - Dynamic Jar Load - FilterList not using DynamicClassLoader
Hi everyone, I've tried to use dynamic jar load ( https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)*HbaseObjectWritable.readObject(in, conf); * filters.add(filter); } } } *HbaseObjectWritable#readObject *uses a conf (created by calling HBaseConfiguration.create()) which i suppose doesn't include a DynamicClassLoader instance. Cheers
Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)
On Fri, Nov 29, 2013 at 9:26 PM, Ted Yu yuzhih...@gmail.com wrote: In HACKING, a sample command is given: $ HBASE_HOME=~/src/hbase make integration ARGS='test f' This means the integration tests need to be run on one of the servers where HBase is deployed, right ? It defaults to using localhost as the ZK quorum specification, but you can specify whatever else you want: $ HBASE_HOME=~/src/hbase make integration ARGS='table family quorumSpec znodePath' I'm thinking of cutting the 1.5 release today, please let me know if you want a bit more time to test it before the release is cut. -- Benoit tsuna Sigoure
Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)
bq. I'm thinking of cutting the 1.5 release today Please go ahead - my testing would focus on backward compatibility. Cheers On Mon, Dec 2, 2013 at 10:33 AM, tsuna tsuna...@gmail.com wrote: On Fri, Nov 29, 2013 at 9:26 PM, Ted Yu yuzhih...@gmail.com wrote: In HACKING, a sample command is given: $ HBASE_HOME=~/src/hbase make integration ARGS='test f' This means the integration tests need to be run on one of the servers where HBase is deployed, right ? It defaults to using localhost as the ZK quorum specification, but you can specify whatever else you want: $ HBASE_HOME=~/src/hbase make integration ARGS='table family quorumSpec znodePath' I'm thinking of cutting the 1.5 release today, please let me know if you want a bit more time to test it before the release is cut. -- Benoit tsuna Sigoure
Re: Online/Realtime query with filter and join?
You are going to want to figure out a rowkey (or a set of tables with rowkeys) to restrict the number of I/O's. If you just slap Impala in front of HBase (or even Phoenix, for that matter) you could write SQL against it but if it's winds up doing a full-scan of an Hbase table underneath you won't get your 100ms response time. Note: I'm not saying you can't do this with Impala or Phoenix, I'm just saying start with the rowkeys first so that you limit the I/O. Then start adding frameworks as needed (and/or build a schema with Phoenix in the same rowkey exercise). Such response-time requirements make me think that this is for application support, so why the requirement for SQL? Might want to start writing it as a Java program first. On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote: You might want to consider something like Impala or Phoenix, I presume you are trying to do some report query for dashboard or UI? MapReduce is certainly not adequate as there is too much latency on startup. If you want to give this a try, cdh4 and Impala are a good start. Mouradk On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote: The general performance requirement for each query is less than 100 ms, that's the average level. Sounds crazy, but yes we need to find a way for it. Thanks Ramon On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote: The question is what you mean of real-time. What is your performance request? In my opinion, I don't think the MapReduce is suitable for the real time data processing. On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote: you can try phoniex. On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks It seems to be impossible, but I still want to check if there is a way we can do complex query on HBase with Order By, JOIN.. etc like we have with normal RDBMS, we are asked to provided such a solution for it, any ideas? Thanks for your help. BTW, i think maybe impala from CDH would be a way to go, but haven't got time to check it yet. Thanks Ramon
Re: Online/Realtime query with filter and join?
In addition to Impala and Pheonix, I'm going to throw PrestoDB into the mix. :) http://prestodb.io/ On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.comwrote: You are going to want to figure out a rowkey (or a set of tables with rowkeys) to restrict the number of I/O's. If you just slap Impala in front of HBase (or even Phoenix, for that matter) you could write SQL against it but if it's winds up doing a full-scan of an Hbase table underneath you won't get your 100ms response time. Note: I'm not saying you can't do this with Impala or Phoenix, I'm just saying start with the rowkeys first so that you limit the I/O. Then start adding frameworks as needed (and/or build a schema with Phoenix in the same rowkey exercise). Such response-time requirements make me think that this is for application support, so why the requirement for SQL? Might want to start writing it as a Java program first. On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote: You might want to consider something like Impala or Phoenix, I presume you are trying to do some report query for dashboard or UI? MapReduce is certainly not adequate as there is too much latency on startup. If you want to give this a try, cdh4 and Impala are a good start. Mouradk On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote: The general performance requirement for each query is less than 100 ms, that's the average level. Sounds crazy, but yes we need to find a way for it. Thanks Ramon On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote: The question is what you mean of real-time. What is your performance request? In my opinion, I don't think the MapReduce is suitable for the real time data processing. On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote: you can try phoniex. On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks It seems to be impossible, but I still want to check if there is a way we can do complex query on HBase with Order By, JOIN.. etc like we have with normal RDBMS, we are asked to provided such a solution for it, any ideas? Thanks for your help. BTW, i think maybe impala from CDH would be a way to go, but haven't got time to check it yet. Thanks Ramon
Re: Online/Realtime query with filter and join?
Pradeep, correct me if I am wrong but prestodb has not released the HBase plugin as yet or they did and maybe I missed the announcement ? I agree with what Doug is saying here, you can't achieve 100ms on every kind of query on HBase unless and until you design the rowkey in a way to help you reduce your I/O. A full scan of a table with billions of rows and columns can take forever, but good indexing (via rowkey or secondary indexes) could help speed up. Thanks, Viral On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.comwrote: In addition to Impala and Pheonix, I'm going to throw PrestoDB into the mix. :) http://prestodb.io/ On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.com wrote: You are going to want to figure out a rowkey (or a set of tables with rowkeys) to restrict the number of I/O's. If you just slap Impala in front of HBase (or even Phoenix, for that matter) you could write SQL against it but if it's winds up doing a full-scan of an Hbase table underneath you won't get your 100ms response time. Note: I'm not saying you can't do this with Impala or Phoenix, I'm just saying start with the rowkeys first so that you limit the I/O. Then start adding frameworks as needed (and/or build a schema with Phoenix in the same rowkey exercise). Such response-time requirements make me think that this is for application support, so why the requirement for SQL? Might want to start writing it as a Java program first. On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote: You might want to consider something like Impala or Phoenix, I presume you are trying to do some report query for dashboard or UI? MapReduce is certainly not adequate as there is too much latency on startup. If you want to give this a try, cdh4 and Impala are a good start. Mouradk On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote: The general performance requirement for each query is less than 100 ms, that's the average level. Sounds crazy, but yes we need to find a way for it. Thanks Ramon On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote: The question is what you mean of real-time. What is your performance request? In my opinion, I don't think the MapReduce is suitable for the real time data processing. On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote: you can try phoniex. On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks It seems to be impossible, but I still want to check if there is a way we can do complex query on HBase with Order By, JOIN.. etc like we have with normal RDBMS, we are asked to provided such a solution for it, any ideas? Thanks for your help. BTW, i think maybe impala from CDH would be a way to go, but haven't got time to check it yet. Thanks Ramon
Re: Online/Realtime query with filter and join?
I agree with Doug Meil's advice. Start with your row key design. In Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead with the columns that you'll filter against most frequently. Then, take a look at adding secondary indexes to speedup queries against other columns. Thanks, James On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.comwrote: In addition to Impala and Pheonix, I'm going to throw PrestoDB into the mix. :) http://prestodb.io/ On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.com wrote: You are going to want to figure out a rowkey (or a set of tables with rowkeys) to restrict the number of I/O's. If you just slap Impala in front of HBase (or even Phoenix, for that matter) you could write SQL against it but if it's winds up doing a full-scan of an Hbase table underneath you won't get your 100ms response time. Note: I'm not saying you can't do this with Impala or Phoenix, I'm just saying start with the rowkeys first so that you limit the I/O. Then start adding frameworks as needed (and/or build a schema with Phoenix in the same rowkey exercise). Such response-time requirements make me think that this is for application support, so why the requirement for SQL? Might want to start writing it as a Java program first. On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote: You might want to consider something like Impala or Phoenix, I presume you are trying to do some report query for dashboard or UI? MapReduce is certainly not adequate as there is too much latency on startup. If you want to give this a try, cdh4 and Impala are a good start. Mouradk On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote: The general performance requirement for each query is less than 100 ms, that's the average level. Sounds crazy, but yes we need to find a way for it. Thanks Ramon On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote: The question is what you mean of real-time. What is your performance request? In my opinion, I don't think the MapReduce is suitable for the real time data processing. On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote: you can try phoniex. On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks It seems to be impossible, but I still want to check if there is a way we can do complex query on HBase with Order By, JOIN.. etc like we have with normal RDBMS, we are asked to provided such a solution for it, any ideas? Thanks for your help. BTW, i think maybe impala from CDH would be a way to go, but haven't got time to check it yet. Thanks Ramon
Re: Online/Realtime query with filter and join?
@Viral I'm not sure... I just know that they mentioned on the front page that PrestoDB can query HBase tables. On Mon, Dec 2, 2013 at 11:07 AM, James Taylor jtay...@salesforce.comwrote: I agree with Doug Meil's advice. Start with your row key design. In Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead with the columns that you'll filter against most frequently. Then, take a look at adding secondary indexes to speedup queries against other columns. Thanks, James On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.com wrote: In addition to Impala and Pheonix, I'm going to throw PrestoDB into the mix. :) http://prestodb.io/ On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.com wrote: You are going to want to figure out a rowkey (or a set of tables with rowkeys) to restrict the number of I/O's. If you just slap Impala in front of HBase (or even Phoenix, for that matter) you could write SQL against it but if it's winds up doing a full-scan of an Hbase table underneath you won't get your 100ms response time. Note: I'm not saying you can't do this with Impala or Phoenix, I'm just saying start with the rowkeys first so that you limit the I/O. Then start adding frameworks as needed (and/or build a schema with Phoenix in the same rowkey exercise). Such response-time requirements make me think that this is for application support, so why the requirement for SQL? Might want to start writing it as a Java program first. On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote: You might want to consider something like Impala or Phoenix, I presume you are trying to do some report query for dashboard or UI? MapReduce is certainly not adequate as there is too much latency on startup. If you want to give this a try, cdh4 and Impala are a good start. Mouradk On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote: The general performance requirement for each query is less than 100 ms, that's the average level. Sounds crazy, but yes we need to find a way for it. Thanks Ramon On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote: The question is what you mean of real-time. What is your performance request? In my opinion, I don't think the MapReduce is suitable for the real time data processing. On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote: you can try phoniex. On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks It seems to be impossible, but I still want to check if there is a way we can do complex query on HBase with Order By, JOIN.. etc like we have with normal RDBMS, we are asked to provided such a solution for it, any ideas? Thanks for your help. BTW, i think maybe impala from CDH would be a way to go, but haven't got time to check it yet. Thanks Ramon
Consequent deletes more than ~256 rows not working
Hi, I have a simple hbase table of approx. 1000 rows. If I invoke the htable.delete() on this table in a while loop - it doesn't throw any error or exception. But at the end of operation - I see that it has actually deleted only about 256 rows. Repeating the operation deletes another 256 or so. And finally after 3 or 4 runs , all rows finally get deleted. This is true of htable.batch API or even htable.delete API. Have tried changing ulimit/nproc settings, invoking flush, setting autocommit, invoking major Compact Also waited out - That is to see if after 5 minutes of first run delete will complete in background. Nothing works. Searched the mailing list and see that there are threads on delete followed by put not working etc. But this is a different case. Anyone who knows what's happening? Looking forward to any pointers! Regards, Mrudula
Re: Consequent deletes more than ~256 rows not working
Which HBase release are you using ? In your while loop, you used the same set of row keys for each attempt ? Thanks On Sun, Dec 1, 2013 at 11:28 PM, Mrudula Madiraju mrudulamadir...@yahoo.com wrote: Hi, I have a simple hbase table of approx. 1000 rows. If I invoke the htable.delete() on this table in a while loop - it doesn't throw any error or exception. But at the end of operation - I see that it has actually deleted only about 256 rows. Repeating the operation deletes another 256 or so. And finally after 3 or 4 runs , all rows finally get deleted. This is true of htable.batch API or even htable.delete API. Have tried changing ulimit/nproc settings, invoking flush, setting autocommit, invoking major Compact Also waited out - That is to see if after 5 minutes of first run delete will complete in background. Nothing works. Searched the mailing list and see that there are threads on delete followed by put not working etc. But this is a different case. Anyone who knows what's happening? Looking forward to any pointers! Regards, Mrudula
Makes search indexes
Hi, general strategy and schemata approach question. I've got a lot of different data in a relational db I'm trying to make searchable. One thing for example is searching for people by email address. I have 6 tables that might be, 10s of millions of records and none of it standardized. So it's mixed case and may have multiple emails in one field or something which isn't an email address at all. To do that as a one off isn't too bad but the data will be added to, and PKs will get phased out and split into multiple PKs etc. Also I want this on a number of other fields too that will need different transformations applied to the data and come from their own set of tables. I could do this a number of ways but I'm not satisfied with any of them and I don't think that such a generic proposition has no tools already somewhat suited for this task. The best tools for this may not be HBase but I'd like to put my HBase cluster to work on this and have it available to MR jobs. Best, James
Re: Consequent deletes more than ~256 rows not working
It would be better paste a piece of your code. On Tue, Dec 3, 2013 at 7:04 AM, Ted Yu yuzhih...@gmail.com wrote: Which HBase release are you using ? In your while loop, you used the same set of row keys for each attempt ? Thanks On Sun, Dec 1, 2013 at 11:28 PM, Mrudula Madiraju mrudulamadir...@yahoo.com wrote: Hi, I have a simple hbase table of approx. 1000 rows. If I invoke the htable.delete() on this table in a while loop - it doesn't throw any error or exception. But at the end of operation - I see that it has actually deleted only about 256 rows. Repeating the operation deletes another 256 or so. And finally after 3 or 4 runs , all rows finally get deleted. This is true of htable.batch API or even htable.delete API. Have tried changing ulimit/nproc settings, invoking flush, setting autocommit, invoking major Compact Also waited out - That is to see if after 5 minutes of first run delete will complete in background. Nothing works. Searched the mailing list and see that there are threads on delete followed by put not working etc. But this is a different case. Anyone who knows what's happening? Looking forward to any pointers! Regards, Mrudula
Re: HBase ExportSnapshot
We see same logs for both options 013-12-02 09:47:41,311 INFO org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Running FLUSH table snapshot tsdb_snap_backup C_M_SNAPSHOT_TABLE on table tsdb 2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support getDefaultReplication 2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support getDefaultBlockSize 2013-12-02 09:47:41,337 INFO org.apache.hadoop.hbase.procedure.Procedure: Starting procedure 'tsdb_snap_backup' 2013-12-02 09:47:41,724 INFO org.apache.hadoop.hbase.procedure.Procedure: Procedure 'tsdb_snap_backup' execution completed 2013-12-02 09:47:41,724 INFO org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all znodes for procedure tsdb_snap_backupincluding nodes /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort 2013-12-02 09:47:41,730 INFO org.apache.hadoop.hbase.master.snapshot.EnabledTableSnapshotHandler: Done waiting - snapshot for tsdb_snap_backup finished! It seems we can't export complete snapshot data directly to local file system using 'ExportSnapshot' command. If we want to copy to outside of cluster first we need to export it to hdfs and then use hadoop get command to copy to local file system. Is this correct? What is the difference between below two commands? hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to file:///tmp/hbase_backup -mappers 16; hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to hdfs:/hbase_backup -mappers 16; Thanks -OC On Mon, Dec 2, 2013 at 10:56 PM, Ted Yu yuzhih...@gmail.com wrote: Can you pastebin master log during operation #2 ? There have been at least two fixes since 0.94.10, listed below. It would be nice if you can verify this behavior using 0.94.14 Cheers r1515967 | mbertozzi | 2013-08-20 13:49:38 -0700 (Tue, 20 Aug 2013) | 1 line HBASE-8760 possible loss of data in snapshot taken after region split r1507792 | mbertozzi | 2013-07-28 05:17:39 -0700 (Sun, 28 Jul 2013) | 1 line HBASE-9060 ExportSnapshot job fails if target path contains percentage character (Jerry He) On Mon, Dec 2, 2013 at 9:19 AM, oc tsdb oc.t...@gmail.com wrote: Hi, We have cluster with 4 data nodes and HBase version is 0.94.10. We have created snapshot for all hbase tables and trying to export snapshot in two ways. option 1.Export snapshot into same cluster hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to *hdfs:/hbase_backup *-mappers 16; Here we are getting full data ( .archive + .hbase-snapshot) exported to hdfs:/hbase_backup option 2.Export snapshot to local filesystem command : hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to *file:///tmp/hbase_backup* -mappers 16; But with option 2 we only getting .hbase-snapshot exported to local dir (/tmp/hbase_backup) but .archive files are not exported.It is expected behavior or something wrong in option 2. Thanks OC
Re: HBase ExportSnapshot
The log you pasted was for taking snapshot. Do you have log from ExportSnapshot ? bq. What is the difference between below two commands? This is the code that determines output FileSystem: FileSystem outputFs = FileSystem.get(outputRoot.toUri(), conf); For 'file:///tmp/hbase_backup' argument, outputFs would be an instance of org.apache.hadoop.fs.LocalFileSystem. Cheers On Mon, Dec 2, 2013 at 9:06 PM, oc tsdb oc.t...@gmail.com wrote: We see same logs for both options 013-12-02 09:47:41,311 INFO org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Running FLUSH table snapshot tsdb_snap_backup C_M_SNAPSHOT_TABLE on table tsdb 2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support getDefaultReplication 2013-12-02 09:47:41,312 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support getDefaultBlockSize 2013-12-02 09:47:41,337 INFO org.apache.hadoop.hbase.procedure.Procedure: Starting procedure 'tsdb_snap_backup' 2013-12-02 09:47:41,724 INFO org.apache.hadoop.hbase.procedure.Procedure: Procedure 'tsdb_snap_backup' execution completed 2013-12-02 09:47:41,724 INFO org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all znodes for procedure tsdb_snap_backupincluding nodes /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort 2013-12-02 09:47:41,730 INFO org.apache.hadoop.hbase.master.snapshot.EnabledTableSnapshotHandler: Done waiting - snapshot for tsdb_snap_backup finished! It seems we can't export complete snapshot data directly to local file system using 'ExportSnapshot' command. If we want to copy to outside of cluster first we need to export it to hdfs and then use hadoop get command to copy to local file system. Is this correct? What is the difference between below two commands? hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to file:///tmp/hbase_backup -mappers 16; hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to hdfs:/hbase_backup -mappers 16; Thanks -OC On Mon, Dec 2, 2013 at 10:56 PM, Ted Yu yuzhih...@gmail.com wrote: Can you pastebin master log during operation #2 ? There have been at least two fixes since 0.94.10, listed below. It would be nice if you can verify this behavior using 0.94.14 Cheers r1515967 | mbertozzi | 2013-08-20 13:49:38 -0700 (Tue, 20 Aug 2013) | 1 line HBASE-8760 possible loss of data in snapshot taken after region split r1507792 | mbertozzi | 2013-07-28 05:17:39 -0700 (Sun, 28 Jul 2013) | 1 line HBASE-9060 ExportSnapshot job fails if target path contains percentage character (Jerry He) On Mon, Dec 2, 2013 at 9:19 AM, oc tsdb oc.t...@gmail.com wrote: Hi, We have cluster with 4 data nodes and HBase version is 0.94.10. We have created snapshot for all hbase tables and trying to export snapshot in two ways. option 1.Export snapshot into same cluster hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to *hdfs:/hbase_backup *-mappers 16; Here we are getting full data ( .archive + .hbase-snapshot) exported to hdfs:/hbase_backup option 2.Export snapshot to local filesystem command : hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot hbase_tbl_snapshot_name -copy-to *file:///tmp/hbase_backup* -mappers 16; But with option 2 we only getting .hbase-snapshot exported to local dir (/tmp/hbase_backup) but .archive files are not exported.It is expected behavior or something wrong in option 2. Thanks OC
Re: Hbase Region Size
In this method, you can get the region's Load per region: private MapString, RegionLoad getRegionsLoad() { try { MapString, RegionLoad regionsNameToLoad = new HashMapString, RegionLoad(); ClusterStatus clusterStatus = hAdmin.getClusterStatus(); for (ServerName serverName : clusterStatus.getServers()) { HServerLoad load = clusterStatus.getLoad(serverName); Mapbyte[], RegionLoad regionsLoad = load.getRegionsLoad(); for (Map.Entrybyte[], RegionLoad entry : regionsLoad.entrySet()) { RegionLoad regionLoad = entry.getValue(); regionsNameToLoad.put(regionLoad.getNameAsString(), regionLoad); } } return regionsNameToLoad; } catch (IOException e1) { throw new RuntimeException(Failed while fetching cluster load: +e1.getMessage(), e1); } } Then you get use it by: regionLoad.getStorefileSizeMB() and there are other methds on RegionLoad. Check it out. Asaf On Mon, Dec 2, 2013 at 3:48 PM, Vineet Mishra clearmido...@gmail.comwrote: Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!
Re: HBase ExportSnapshot
here is snapshot export logs. mastre log: === 2013-12-02 21:54:30,840 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1 2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1 2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1 2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1 snapshot export console log: = 2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1 2013-12-02 21:54:30,841 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1 at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitReferencedFiles(SnapshotReferenceUtil.java:101) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:385) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:633) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:705) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:709) 13/12/02 21:54:24 INFO util.FSVisitor: No families under region directory:hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-meta_snap_backup/f06335933b32019c4369f95001d996fb 13/12/02 21:54:24 INFO util.FSVisitor: No logs under directory:hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-meta_snap_backup/.logs 13/12/02 21:54:24 WARN snapshot.ExportSnapshot: There are 0 store file to be copied. There may be no data in the table. 13/12/02 21:54:25 INFO util.FSVisitor: No families under region directory:hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-tree_snap_backup/c40c34c4312ccb3302fbaf62caa91b9c 13/12/02 21:54:25 INFO util.FSVisitor: No logs under directory:hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/.hbase-snapshot/tsdb-tree_snap_backup/.logs 13/12/02 21:54:25 WARN snapshot.ExportSnapshot: There are 0 store file to be copied. There may be no data in the table. Exception in thread main java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/tsdb-uid/f9e5e554f111dc0679dfc8069b282ff7/id/ed071cd010534856adc4be997498d645, hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/.tmp/tsdb-uid/f9e5e554f111dc0679dfc8069b282ff7/id/ed071cd010534856adc4be997498d645, hdfs:// site.com:54310/data_full_backup_2013-12-02_21.49.20/.archive/tsdb-uid/f9e5e554f111dc0679dfc8069b282ff7/id/ed071cd010534856adc4be997498d645 ] at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:376) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$1.storeFile(ExportSnapshot.java:390) at org.apache.hadoop.hbase.util.FSVisitor.visitRegionStoreFiles(FSVisitor.java:115) at org.apache.hadoop.hbase.util.FSVisitor.visitTableStoreFiles(FSVisitor.java:81) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:116) at org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitReferencedFiles(SnapshotReferenceUtil.java:101) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.getSnapshotFiles(ExportSnapshot.java:385) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:633) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:705) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:709) Basically while exporting to local i dont see .archive directory.why? Please comment on this - It seems we can't export complete snapshot data directly to local file system using 'ExportSnapshot' command. If we want to copy to outside of cluster first we need to export it to hdfs and then use hadoop get command to copy to local file system. Is this correct? Thanks -OC On Tue, Dec 3, 2013 at 11:09 AM, Ted Yu yuzhih...@gmail.com wrote: The log you pasted was for taking snapshot. Do you have log from ExportSnapshot ? bq. What is the difference between below two commands? This is the code that determines output FileSystem:
Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)
Depends on what version of HBase you are using. If you are using HBase 0.95+, with 1.5.0 asynchbase you will be able to use any filters that exist in HBase. Though you might need to add the required classes since asynchbase needs them to serialize to the old RPC protocol for 0.95 and uses protobufs for versions 0.95. Thanks, Viral On Mon, Dec 2, 2013 at 9:04 PM, ripsacCTO ankur...@gmail.com wrote: Hi Tsuna, Just wanted to know , if there is any way of Scanning using column value and with filters on column value. Eg. Suppose I have column values against qualifier waterlevel as 40,20,30,10 etc. *If my requirement is like that I want to fetch data with requirement like give me data with water level 20.* Is such a query possible. Please suggest some examples. Does the latest version of asyncbase support the above ? If yes , please suggest with some example code base. Thanks in advance. On Tuesday, October 29, 2013 10:27:13 AM UTC+5:30, tsuna wrote: Hi all, The first release candidate of AsyncHBase post-singularity is now available for download. AsyncHBase remains true to its initial promise: the API is still backward compatible, but under the hood it continues to work with all production releases of HBase of the past few years. This release was tested against HBase 0.89, 0.90, 0.92, 0.94, and 0.96. While 0.2x probably still works, I didn't take the time to test it, because… well, really, you shouldn't be using such ancient version of HBase. Really. The Maven build was broken by the addition of protobufs in the build process. Any Maven fans out there who would like to help fix it? Without it I can't easily publish new artifacts to Maven repo. Here is the relevant excerpt of the NEWS file: This release introduces compatibility with HBase 0.96 and up, and adds a dependency on Google's protobuf-java library. Note that HBase 0.95.x, which was a developer preview release train, is NOT supported. Please note that support for explicit row locks has been removed from HBase 0.95 and up. While the classes and functionality remain usable when using earlier versions of HBase, an `UnsupportedOperationException' will be raised if one attempt to send a `RowLockRequest' to a newer version of HBase. Please note that while AsyncHBase never made any guarantees about the exact order in which multiple edits are applied within a batch, the order is now different when talking to HBase 0.96 and up. New public APIs: - Scanners can now use a variety of different filters via the new `ScanFilter' interfaces and its various implementations. - It's possible to specify specific families to scan via `setFamilies'. - Scanners can put an upper bound on the amount of data fetched by RPC via the new `setMaxNumKeyValues' (works with HBase 0.96 and up only). - HBaseRpc now has a `failfast()' and a `setFailfast(boolean)' pair of methods to allow RPCs to fail as soon as their encounter an issue out of the ordinary (e.g. not just a `NotSuchRegionException'). - `GetRequest' has additional constructor overloads that make its API more uniform with that of other RPCs. Noteworthy bug fixes: - DeleteRequest wasn't honoring its timestamp if one was given (#58). - When a connection attempt fails, buffered RPCs weren't cleaned up or retried properly. - When one RPC fails because of another one (e.g. we fail to send an RPC because a META lookup failed), the asynchronous exception that is given to the callback now properly carries the original RPC that failed. - There was an unlikely race condition that could cause an NPE while trying to retrieve the ROOT region from ZooKeeper. Pre-compiled JAR: http://tsunanet.net/~tsuna/asynchbase/asynchbase-1.5.0- rc1.jar Source: https://github.com/tsuna/asynchbase Javadoc: http://tsunanet.net/~tsuna/asynchbase/1.5.0/org/hbase/ async/HBaseClient.html $ git diff --stat v1.4.1.. | tail -n 1 70 files changed, 4824 insertions(+), 487 deletions(-) $ git shortlog v1.4.1.. Andrey Stepachev (1): Add support for multiple families/qualifiers in scanners. Benoit Sigoure (65): Start v1.5.0. Add Viral to AUTHORS for his work on ScanFilter. Document ScanFilter and prevent it from being subclassed externally. Convert the regexp key filtering mechanism to the ScanFilter. Document how to run integration tests. Enhance filters a bit and add integration tests. Add a new helper function to produce better errors during tests. Mention new scanner filters in NEWS. Allow RPCs to fail-fast. Update NEWS / THANKS. Update suasync to 1.3.2. Properly clean up when connection fails before being opened. Properly report which RPC has failed in HasFailedRpcException. Fix a small race condition when looking up the ROOT region. Add HBase protocol buffers to the compilation process.