Pausing .archive file cleaner
Hi, I am writing a custom ExportSnapshot utility where I cannot guarantee order in which files are exported to destination location. So archive files (or any other hfiles or log files) could get copied before the links are created in .hbase-snapshot directory. So, as I understand, in such situations, there is a possibility that .archive file could get deleted before I create corresponding link file in .hbase-snapshot dir. Can .archive file cleaner thread be paused for some time? Is there any configuration to do that? Thanks, Siddharth
Re: Co-Processors in Hase 0.95.2 version
Here is an example: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.html On Sep 22, 2013, at 10:14 PM, yeshwanth kumar yeshwant...@gmail.com wrote: Hi, facing some difficulty to write the co-processors in hbase 0.95.2 version, looking for some tutorials and examples can anyone provide me some examples how the co-processors are related with protobuffer's .. Thanks
Re: Pausing .archive file cleaner
no, at the moment you can not stop the archiving process. can't you just copy the .hbase-snapshot dir first? also it should be small enough to do it as pre-MR step, since all the files are empty. Matteo On Mon, Sep 23, 2013 at 7:48 AM, Siddharth Karandikar siddharth.karandi...@gmail.com wrote: Hi, I am writing a custom ExportSnapshot utility where I cannot guarantee order in which files are exported to destination location. So archive files (or any other hfiles or log files) could get copied before the links are created in .hbase-snapshot directory. So, as I understand, in such situations, there is a possibility that .archive file could get deleted before I create corresponding link file in .hbase-snapshot dir. Can .archive file cleaner thread be paused for some time? Is there any configuration to do that? Thanks, Siddharth
Re: quieting HBase metrics
Hi All, I have the same query. Tried searching for a solution on the web but found nothing very helpful. Is there some way I could disable metrics for every table in HBase from being sent to Ganglia? Or just any information on how to configure HBase to emit fewer metrics? Thanks, Arati Patro On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.comwrote: In http://hbase.apache.org/book/hbase_metrics.html we see 15.4.2. Warning To Ganglia Users Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. However, although there is documentation to help set up Ganglia metrics for HBase, there doesn't seem to be any that shows how to make HBase emit fewer metrics. Could someone lend me a hand? I have increased the period in hadoop-metrics.properties but that has proven insufficient. thanks rone
Re: Use HBase api -- NullPointerException
I am sorry I am late,the version is zookeeper-3.4.5 hbase-0.94.10 hbase not manager zookeeper 2013/9/23 Ted Yu yuzhih...@gmail.com What HBase version were they using ? Was HBase managing zookeeper ensemble ? src/main/java/org/apache/hadoop/hbase/zookeeper/ZKConfig.java hasn't changed since 2012-03-29 Cheers On Sun, Sep 22, 2013 at 9:22 PM, kun yan yankunhad...@gmail.com wrote: My friends in the following problems when using HBase 0.94. When the program is running successfully in window, and then put on linux does not work, and report the following exception. HBase code just simply query and insert work To specify Zookeeper address? Let him do the work started Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:60) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:147) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:127) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1401) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:612) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:882) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:857) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:233) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:173) at org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36) at org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:265) at org.apache.hadoop.hbase.client.HTablePool.findOrCreateTable(HTablePool.java:195) at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:174) at com.axon.icloud.datax.config.TableSchemaManager.getHTable(TableSchemaManager.java:288) at com.axon.icloud.datax.LogProcessTask.getHTable(LogProcessTask.java:466) at com.axon.icloud.datax.LogProcessTask.importData(LogProcessTask.java:543) at com.axon.icloud.datax.LogProcessTask.run(LogProcessTask.java:171) -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
Loading hbase-site.xml settings from Hadoop MR job
I'm having an issue where my Hadoop MR job for bulk loading data into Hbase is not reading my hbase-site.xml file -- thus it tries to connect to Zookeeper on localhost. This is on a cluster using CDH4 on Ubuntu 12.04. Here's the code where it attempts to connect to local zookeeper: Configuration conf = new Configuration(); // from org.apache.hadoop.conf Job job = new Job(conf); HTable hTable = new HTable(conf, tableName); HFileOutputFormat.configureIncrementalLoad(job, hTable); As suggested by another thread I came across, I've added /etc/hbase/conf/ to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted services, but no improvement. Here is the full classpath: /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Any thoughts on what the problem could be?
Hbase ports
Hi all. I'm doing a project for my university so that i have to know perfectly how all the Hbase ports work. Studing the documentation i found that Zookeeper accept connection on port 2181, Hbase master on port 6 and Hbase regionservers on port 60020. I didn't understand the importance of port 60010 on master and port 60030 on regionservers. Can i not use them? More important: if i launch Hbase in pseudo-distribuited mode, running all processes on localhost, what ports are used for each of the processes if i launch 1, 2, 3 or more backup masters and if i launch few regionservers (less than 10) or a lot of regionservers (10, 20, 100)?
Re: Loading hbase-site.xml settings from Hadoop MR job
Maybe you should putting this configurations within your class path, so it can be reached from your clients env. 2013/9/23 Shahab Yunus shahab.yu...@gmail.com From where are you running your job? From which machine? This client machine from where you are kicking of this job should have the hbase-site.xml with the correct ZK info in it. It seems that your client/job is having and issue picking up the right ZK, rather than the services running on your non-local cluster. Regards, Shahab On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.com wrote: I'm having an issue where my Hadoop MR job for bulk loading data into Hbase is not reading my hbase-site.xml file -- thus it tries to connect to Zookeeper on localhost. This is on a cluster using CDH4 on Ubuntu 12.04. Here's the code where it attempts to connect to local zookeeper: Configuration conf = new Configuration(); // from org.apache.hadoop.conf Job job = new Job(conf); HTable hTable = new HTable(conf, tableName); HFileOutputFormat.configureIncrementalLoad(job, hTable); As suggested by another thread I came across, I've added /etc/hbase/conf/ to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted services, but no improvement. Here is the full classpath: /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Any thoughts on what the problem could be?
Re: Loading hbase-site.xml settings from Hadoop MR job
From where are you running your job? From which machine? This client machine from where you are kicking of this job should have the hbase-site.xml with the correct ZK info in it. It seems that your client/job is having and issue picking up the right ZK, rather than the services running on your non-local cluster. Regards, Shahab On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.comwrote: I'm having an issue where my Hadoop MR job for bulk loading data into Hbase is not reading my hbase-site.xml file -- thus it tries to connect to Zookeeper on localhost. This is on a cluster using CDH4 on Ubuntu 12.04. Here's the code where it attempts to connect to local zookeeper: Configuration conf = new Configuration(); // from org.apache.hadoop.conf Job job = new Job(conf); HTable hTable = new HTable(conf, tableName); HFileOutputFormat.configureIncrementalLoad(job, hTable); As suggested by another thread I came across, I've added /etc/hbase/conf/ to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted services, but no improvement. Here is the full classpath: /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Any thoughts on what the problem could be?
Re: Co-Processors in Hase 0.95.2 version
Yeshwanth, With 0.95.2 coprocessors use protocol instead of serializing Java objects, one would use protobuf. Most of the tutorials on protobuf are mainly about the data structures and not much about the rpc mechanism, which is what we use in 0.95.2 The RPC is defined like this in a proto file - service TermIdSearchService { rpc getTermIdWithCount(TermList) returns (TermIdCountList); } Then it is implemented like this (this is also what to install as coprocessor in the table) public class TermIdSearchEndpointV2 extends TermIdSearchProtocol.TermIdSearchService implements Coprocessor, CoprocessorService { . @Override public void getTermIdWithCount(RpcController controller, TermIdSearchProtocol.TermList termlist, RpcCallbackTermIdSearchProtocol.TermIdCountList callback) { } } Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com From: yeshwanth kumar yeshwant...@gmail.com To: user@hbase.apache.org Sent: Sunday, September 22, 2013 10:14 PM Subject: Co-Processors in Hase 0.95.2 version Hi, facing some difficulty to write the co-processors in hbase 0.95.2 version, looking for some tutorials and examples can anyone provide me some examples how the co-processors are related with protobuffer's .. Thanks
Re: Loading hbase-site.xml settings from Hadoop MR job
Hi Shahab, The Hadoop MR job is being started from one of the slaves on the cluster (i.e. where a HBase RegionServer is running, along with a TaskTracker). The hbase-site.xml file on this machine points to the correct ZK quorum. One interesting thing I've noticed is that the right ZK server is used for the following code (that is in the same MR job, just a few lines earlier): HBaseAdmin admin = new HBaseAdmin(conf); admin.disableTable(tableName); admin.deleteTable(tableName); Could the issue be with HTable or HFileOutputFormat (i.e. maybe they have a config or config-path hard-coded)? On Mon, Sep 23, 2013 at 12:53 PM, Shahab Yunus shahab.yu...@gmail.comwrote: From where are you running your job? From which machine? This client machine from where you are kicking of this job should have the hbase-site.xml with the correct ZK info in it. It seems that your client/job is having and issue picking up the right ZK, rather than the services running on your non-local cluster. Regards, Shahab On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.com wrote: I'm having an issue where my Hadoop MR job for bulk loading data into Hbase is not reading my hbase-site.xml file -- thus it tries to connect to Zookeeper on localhost. This is on a cluster using CDH4 on Ubuntu 12.04. Here's the code where it attempts to connect to local zookeeper: Configuration conf = new Configuration(); // from org.apache.hadoop.conf Job job = new Job(conf); HTable hTable = new HTable(conf, tableName); HFileOutputFormat.configureIncrementalLoad(job, hTable); As suggested by another thread I came across, I've added /etc/hbase/conf/ to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted services, but no improvement. Here is the full classpath: /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Any thoughts on what the problem could be?
Re: Loading hbase-site.xml settings from Hadoop MR job
Hi Renato, Can you clarify your recommendation? Currently I've added the directory where my hbase-site.xml file lives (/etc/hbase/conf/) to my Hadoop classpath (as described above). Note: from the client machine (where I'm starting my MR job), I generated the above class path by running hadoop classpath. Also worth noting that the /etc/hbase/conf/hbase-site.xml file on this client machine points to the correct ZK quorum. Thanks On Mon, Sep 23, 2013 at 1:06 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Maybe you should putting this configurations within your class path, so it can be reached from your clients env. 2013/9/23 Shahab Yunus shahab.yu...@gmail.com From where are you running your job? From which machine? This client machine from where you are kicking of this job should have the hbase-site.xml with the correct ZK info in it. It seems that your client/job is having and issue picking up the right ZK, rather than the services running on your non-local cluster. Regards, Shahab On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.com wrote: I'm having an issue where my Hadoop MR job for bulk loading data into Hbase is not reading my hbase-site.xml file -- thus it tries to connect to Zookeeper on localhost. This is on a cluster using CDH4 on Ubuntu 12.04. Here's the code where it attempts to connect to local zookeeper: Configuration conf = new Configuration(); // from org.apache.hadoop.conf Job job = new Job(conf); HTable hTable = new HTable(conf, tableName); HFileOutputFormat.configureIncrementalLoad(job, hTable); As suggested by another thread I came across, I've added /etc/hbase/conf/ to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted services, but no improvement. Here is the full classpath: /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Any thoughts on what the problem could be?
Re: Hbase ports
On Mon, Sep 23, 2013 at 9:14 AM, John Foxinhead john.foxinh...@gmail.comwrote: Hi all. I'm doing a project for my university so that i have to know perfectly how all the Hbase ports work. Studing the documentation i found that Zookeeper accept connection on port 2181, Hbase master on port 6 and Hbase regionservers on port 60020. I didn't understand the importance of port 60010 on master and port 60030 on regionservers. Can i not use them? From the documentation (http://hbase.apache.org/book.html#config.files): hbase.regionserver.info.port The port for the HBase RegionServer web UI Set to -1 if you do not want the RegionServer UI to run. Default: 60030 You can look for the other port in there too. More important: if i launch Hbase in pseudo-distribuited mode, running all processes on localhost, what ports are used for each of the processes if i launch 1, 2, 3 or more backup masters and if i launch few regionservers (less than 10) or a lot of regionservers (10, 20, 100)? It'll clash, you'll have to have different hbase-site.xml for each process you want to start. J-D
HBase consultant to help with scaling
We have an in-house Hadoop/HBase cluster that we're trying to use for storing application log data and we're running into issues writing to it at high throughput rates upwards of 15K writes per second. We've changed many configurations and did multiple rounds of performance tuning, but we are always hitting the same throughput limit. What is the best way of finding and hiring a professional HBase consultant that can help? -Omar Baba
Re: HBase consultant to help with scaling
zaloni - Ben Sharma - b...@zaloni.com thirdeye - DJ Das - debjyoti@gmail.com Phone - 408-431-1487 (Mobile) On Mon, Sep 23, 2013 at 11:56 AM, Omar Baba omar.b...@gmail.com wrote: We have an in-house Hadoop/HBase cluster that we're trying to use for storing application log data and we're running into issues writing to it at high throughput rates upwards of 15K writes per second. We've changed many configurations and did multiple rounds of performance tuning, but we are always hitting the same throughput limit. What is the best way of finding and hiring a professional HBase consultant that can help? -Omar Baba -- hubb...@hubbertsmith.com | 385 315 0198 -new mobile # Data Center Storage - my latest book on Amazon http://tinyurl.com/8x29cns | LinkedIN http://tinyurl.com/7v5eu2p
Write TimeSeries Data and Do Time Based Range Scans
Hi All, I have a secondary index(inverted index) table with a rowkey on the basis of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch. I also store some extra(apart from main_table rowkey) columns in that table for doing filtering. The requirement is to do range-based scan on the basis of time of event. Hence, the index with this rowkey. I cannot use Hashing or MD5 digest solution because then i cannot do range based scans. And, i already have a index like OpenTSDB in another table for the same dataset.(I have many secondary Index for same data set.) Problem: When we increase the write workload during stress test. Time secondary index becomes a bottleneck due to the famous Region HotSpotting problem. Solution: I am thinking of adding a prefix of { (TimeStamp in Epoch%10) = bucket} in the rowkey. Then my row key will become: BucketTimeStamp in Epoch By using above rowkey i can at least alleviate *WRITE* problem.(i don't think problem can be fixed permanently because of the use case requirement. I would love to be proven wrong.) However, with the above row key, now when i want to *READ* data, for every single range scans i have to read data from 10 different regions. This extra load for read is scaring me a bit. I am wondering if anyone has better suggestion/approach to solve this problem given the constraints i have. Looking for feedback from community. -- Thanks Regards, Anil Gupta
Re: Write TimeSeries Data and Do Time Based Range Scans
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ Here you can find the discussion, trade-offs and working code/API (even for M/R) about this and the approach you are trying out. Regards, Shahab On Mon, Sep 23, 2013 at 5:41 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, I have a secondary index(inverted index) table with a rowkey on the basis of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch. I also store some extra(apart from main_table rowkey) columns in that table for doing filtering. The requirement is to do range-based scan on the basis of time of event. Hence, the index with this rowkey. I cannot use Hashing or MD5 digest solution because then i cannot do range based scans. And, i already have a index like OpenTSDB in another table for the same dataset.(I have many secondary Index for same data set.) Problem: When we increase the write workload during stress test. Time secondary index becomes a bottleneck due to the famous Region HotSpotting problem. Solution: I am thinking of adding a prefix of { (TimeStamp in Epoch%10) = bucket} in the rowkey. Then my row key will become: BucketTimeStamp in Epoch By using above rowkey i can at least alleviate *WRITE* problem.(i don't think problem can be fixed permanently because of the use case requirement. I would love to be proven wrong.) However, with the above row key, now when i want to *READ* data, for every single range scans i have to read data from 10 different regions. This extra load for read is scaring me a bit. I am wondering if anyone has better suggestion/approach to solve this problem given the constraints i have. Looking for feedback from community. -- Thanks Regards, Anil Gupta
Re: Write TimeSeries Data and Do Time Based Range Scans
Hi Shahab, If you read my solution carefully. I am already doing that. Thanks, Anil Gupta On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus shahab.yu...@gmail.comwrote: http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ Here you can find the discussion, trade-offs and working code/API (even for M/R) about this and the approach you are trying out. Regards, Shahab On Mon, Sep 23, 2013 at 5:41 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, I have a secondary index(inverted index) table with a rowkey on the basis of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch. I also store some extra(apart from main_table rowkey) columns in that table for doing filtering. The requirement is to do range-based scan on the basis of time of event. Hence, the index with this rowkey. I cannot use Hashing or MD5 digest solution because then i cannot do range based scans. And, i already have a index like OpenTSDB in another table for the same dataset.(I have many secondary Index for same data set.) Problem: When we increase the write workload during stress test. Time secondary index becomes a bottleneck due to the famous Region HotSpotting problem. Solution: I am thinking of adding a prefix of { (TimeStamp in Epoch%10) = bucket} in the rowkey. Then my row key will become: BucketTimeStamp in Epoch By using above rowkey i can at least alleviate *WRITE* problem.(i don't think problem can be fixed permanently because of the use case requirement. I would love to be proven wrong.) However, with the above row key, now when i want to *READ* data, for every single range scans i have to read data from 10 different regions. This extra load for read is scaring me a bit. I am wondering if anyone has better suggestion/approach to solve this problem given the constraints i have. Looking for feedback from community. -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: Write TimeSeries Data and Do Time Based Range Scans
Yeah, I saw that. In fact that is why I recommended that to you as I couldn't infer from your email that whether you have already gone through that source or not. A source, who did the exact same thing and discuss it in much more detail and concerns aligning with yours (in fact I think some of the authors/creators of that link/group are members here of this community as well.) Regards, Shahab On Mon, Sep 23, 2013 at 8:41 PM, anil gupta anilgupt...@gmail.com wrote: Hi Shahab, If you read my solution carefully. I am already doing that. Thanks, Anil Gupta On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus shahab.yu...@gmail.com wrote: http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ Here you can find the discussion, trade-offs and working code/API (even for M/R) about this and the approach you are trying out. Regards, Shahab On Mon, Sep 23, 2013 at 5:41 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, I have a secondary index(inverted index) table with a rowkey on the basis of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch. I also store some extra(apart from main_table rowkey) columns in that table for doing filtering. The requirement is to do range-based scan on the basis of time of event. Hence, the index with this rowkey. I cannot use Hashing or MD5 digest solution because then i cannot do range based scans. And, i already have a index like OpenTSDB in another table for the same dataset.(I have many secondary Index for same data set.) Problem: When we increase the write workload during stress test. Time secondary index becomes a bottleneck due to the famous Region HotSpotting problem. Solution: I am thinking of adding a prefix of { (TimeStamp in Epoch%10) = bucket} in the rowkey. Then my row key will become: BucketTimeStamp in Epoch By using above rowkey i can at least alleviate *WRITE* problem.(i don't think problem can be fixed permanently because of the use case requirement. I would love to be proven wrong.) However, with the above row key, now when i want to *READ* data, for every single range scans i have to read data from 10 different regions. This extra load for read is scaring me a bit. I am wondering if anyone has better suggestion/approach to solve this problem given the constraints i have. Looking for feedback from community. -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: quieting HBase metrics
Me, too! Otis -- HBase Performance Monitoring -- http://sematext.com/spm On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com wrote: Hi All, I have the same query. Tried searching for a solution on the web but found nothing very helpful. Is there some way I could disable metrics for every table in HBase from being sent to Ganglia? Or just any information on how to configure HBase to emit fewer metrics? Thanks, Arati Patro On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.comwrote: In http://hbase.apache.org/book/hbase_metrics.html we see 15.4.2. Warning To Ganglia Users Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. However, although there is documentation to help set up Ganglia metrics for HBase, there doesn't seem to be any that shows how to make HBase emit fewer metrics. Could someone lend me a hand? I have increased the period in hadoop-metrics.properties but that has proven insufficient. thanks rone
Re: quieting HBase metrics
hbase metrics contains different types:RegionServerMetrics and RegionServerDynamicMetrics. And there are more metrics emitted from RegionServerDynamicMetrics, especially in case that there are more regions in RS. RegionServerMetrics and RSDM share one metrics context, which makes the control become difficult. In our production, we add a metrics filter in MetricsContext, which can be configured as to our needs, which will reduce the amount of metrics, and take a good effect. 2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com Me, too! Otis -- HBase Performance Monitoring -- http://sematext.com/spm On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com wrote: Hi All, I have the same query. Tried searching for a solution on the web but found nothing very helpful. Is there some way I could disable metrics for every table in HBase from being sent to Ganglia? Or just any information on how to configure HBase to emit fewer metrics? Thanks, Arati Patro On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com wrote: In http://hbase.apache.org/book/hbase_metrics.html we see 15.4.2. Warning To Ganglia Users Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. However, although there is documentation to help set up Ganglia metrics for HBase, there doesn't seem to be any that shows how to make HBase emit fewer metrics. Could someone lend me a hand? I have increased the period in hadoop-metrics.properties but that has proven insufficient. thanks rone -- Bing Jiang weibo: http://weibo.com/jiangbinglover BLOG: www.binospace.com BLOG: http://blog.sina.com.cn/jiangbinglover Focus on distributed computing, HDFS/HBase 2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com Me, too! Otis -- HBase Performance Monitoring -- http://sematext.com/spm On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com wrote: Hi All, I have the same query. Tried searching for a solution on the web but found nothing very helpful. Is there some way I could disable metrics for every table in HBase from being sent to Ganglia? Or just any information on how to configure HBase to emit fewer metrics? Thanks, Arati Patro On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com wrote: In http://hbase.apache.org/book/hbase_metrics.html we see 15.4.2. Warning To Ganglia Users Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. However, although there is documentation to help set up Ganglia metrics for HBase, there doesn't seem to be any that shows how to make HBase emit fewer metrics. Could someone lend me a hand? I have increased the period in hadoop-metrics.properties but that has proven insufficient. thanks rone -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: www.binospace.com BLOG: http://blog.sina.com.cn/jiangbinglover Focus on distributed computing, HDFS/HBase
Re: quieting HBase metrics
Thanks, Bing, but i still don't understand *how* to configure that. Could you share your hadoop-metrics.properties file, or any other files you use for this? rone On Mon, Sep 23, 2013 at 8:29 PM, Bing Jiang jiangbinglo...@gmail.comwrote: hbase metrics contains different types:RegionServerMetrics and RegionServerDynamicMetrics. And there are more metrics emitted from RegionServerDynamicMetrics, especially in case that there are more regions in RS. RegionServerMetrics and RSDM share one metrics context, which makes the control become difficult. In our production, we add a metrics filter in MetricsContext, which can be configured as to our needs, which will reduce the amount of metrics, and take a good effect. 2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com Me, too! Otis -- HBase Performance Monitoring -- http://sematext.com/spm On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com wrote: Hi All, I have the same query. Tried searching for a solution on the web but found nothing very helpful. Is there some way I could disable metrics for every table in HBase from being sent to Ganglia? Or just any information on how to configure HBase to emit fewer metrics? Thanks, Arati Patro On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com wrote: In http://hbase.apache.org/book/hbase_metrics.html we see 15.4.2. Warning To Ganglia Users Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. However, although there is documentation to help set up Ganglia metrics for HBase, there doesn't seem to be any that shows how to make HBase emit fewer metrics. Could someone lend me a hand? I have increased the period in hadoop-metrics.properties but that has proven insufficient. thanks rone -- Bing Jiang weibo: http://weibo.com/jiangbinglover BLOG: www.binospace.com BLOG: http://blog.sina.com.cn/jiangbinglover Focus on distributed computing, HDFS/HBase 2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com Me, too! Otis -- HBase Performance Monitoring -- http://sematext.com/spm On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com wrote: Hi All, I have the same query. Tried searching for a solution on the web but found nothing very helpful. Is there some way I could disable metrics for every table in HBase from being sent to Ganglia? Or just any information on how to configure HBase to emit fewer metrics? Thanks, Arati Patro On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com wrote: In http://hbase.apache.org/book/hbase_metrics.html we see 15.4.2. Warning To Ganglia Users Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics. However, although there is documentation to help set up Ganglia metrics for HBase, there doesn't seem to be any that shows how to make HBase emit fewer metrics. Could someone lend me a hand? I have increased the period in hadoop-metrics.properties but that has proven insufficient. thanks rone -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: www.binospace.com BLOG: http://blog.sina.com.cn/jiangbinglover Focus on distributed computing, HDFS/HBase