Pausing .archive file cleaner

2013-09-23 Thread Siddharth Karandikar
Hi,

I am writing a custom ExportSnapshot utility where I cannot guarantee
order in which files are exported to destination location. So archive
files (or any other hfiles or log files) could get copied before the
links are created in .hbase-snapshot directory.

So, as I understand, in such situations, there is a possibility that
.archive file could get deleted before I create corresponding link
file in .hbase-snapshot dir.

Can .archive file cleaner thread be paused for some time? Is there any
configuration to do that?


Thanks,
Siddharth


Re: Co-Processors in Hase 0.95.2 version

2013-09-23 Thread Ted Yu
Here is an example:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.html

On Sep 22, 2013, at 10:14 PM, yeshwanth kumar yeshwant...@gmail.com wrote:

 Hi,
 
 facing some difficulty to write the co-processors in hbase 0.95.2 version,
 looking for some tutorials and examples
 
 can anyone provide me some examples
 
 
 how the co-processors are related with protobuffer's ..
 
 Thanks


Re: Pausing .archive file cleaner

2013-09-23 Thread Matteo Bertozzi
no, at the moment you can not stop the archiving process.

can't you just copy the .hbase-snapshot dir first?
also it should be small enough to do it as pre-MR step, since all the
files are empty.

Matteo



On Mon, Sep 23, 2013 at 7:48 AM, Siddharth Karandikar 
siddharth.karandi...@gmail.com wrote:

 Hi,

 I am writing a custom ExportSnapshot utility where I cannot guarantee
 order in which files are exported to destination location. So archive
 files (or any other hfiles or log files) could get copied before the
 links are created in .hbase-snapshot directory.

 So, as I understand, in such situations, there is a possibility that
 .archive file could get deleted before I create corresponding link
 file in .hbase-snapshot dir.

 Can .archive file cleaner thread be paused for some time? Is there any
 configuration to do that?


 Thanks,
 Siddharth



Re: quieting HBase metrics

2013-09-23 Thread Arati Patro
Hi All,


I have the same query. Tried searching for a solution on the web but found
nothing very helpful. Is there some way I could disable metrics for every
table in HBase from being sent to Ganglia? Or just any information on how
to configure HBase to emit fewer metrics?


Thanks,
Arati Patro


On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.comwrote:

 In http://hbase.apache.org/book/hbase_metrics.html we see

 
 15.4.2. Warning To Ganglia Users

 Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per
 RegionServer which may swamp your installation. Options include either
 increasing Ganglia server capacity, or configuring HBase to emit fewer
 metrics.
 

 However, although there is documentation to help set up Ganglia metrics for
 HBase, there doesn't seem to be any that shows how to make HBase emit fewer
 metrics.  Could someone lend me a hand?  I have increased the period in
 hadoop-metrics.properties but that has proven insufficient.

 thanks
 rone



Re: Use HBase api -- NullPointerException

2013-09-23 Thread kun yan
I am sorry I am late,the version is zookeeper-3.4.5 hbase-0.94.10 hbase not
manager zookeeper


2013/9/23 Ted Yu yuzhih...@gmail.com

 What HBase version were they using ?

 Was HBase managing zookeeper ensemble ?

 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKConfig.java hasn't
 changed since 2012-03-29

 Cheers


 On Sun, Sep 22, 2013 at 9:22 PM, kun yan yankunhad...@gmail.com wrote:

  My friends in the following problems when using HBase 0.94. When the
  program is running successfully in window, and then put on linux does not
  work, and report the following exception. HBase code just simply query
 and
  insert work
 
  To specify Zookeeper  address? Let him do the work started
 
  Caused by: java.lang.NullPointerException
  at
 org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:60)
  at
 
 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
  at
 
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:147)
  at
 
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:127)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1401)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:612)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:882)
  at
 
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:857)
  at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:233)
  at org.apache.hadoop.hbase.client.HTable.init(HTable.java:173)
  at
 
 
 org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
  at
 
 org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:265)
  at
 
 
 org.apache.hadoop.hbase.client.HTablePool.findOrCreateTable(HTablePool.java:195)
  at
 org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:174)
  at
 
 
 com.axon.icloud.datax.config.TableSchemaManager.getHTable(TableSchemaManager.java:288)
  at
 com.axon.icloud.datax.LogProcessTask.getHTable(LogProcessTask.java:466)
  at
 com.axon.icloud.datax.LogProcessTask.importData(LogProcessTask.java:543)
  at com.axon.icloud.datax.LogProcessTask.run(LogProcessTask.java:171)
 
  --
 
  In the Hadoop world, I am just a novice, explore the entire Hadoop
  ecosystem, I hope one day I can contribute their own code
 
  YanBit
  yankunhad...@gmail.com
 




-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com


Loading hbase-site.xml settings from Hadoop MR job

2013-09-23 Thread Dolan Antenucci
I'm having an issue where my Hadoop MR job for bulk loading data into Hbase
is not reading my hbase-site.xml file -- thus it tries to connect to
Zookeeper on localhost.  This is on a cluster using CDH4 on Ubuntu 12.04.

Here's the code where it attempts to connect to local zookeeper:
Configuration conf = new Configuration(); // from org.apache.hadoop.conf
Job job = new Job(conf);
HTable hTable = new HTable(conf, tableName);
HFileOutputFormat.configureIncrementalLoad(job, hTable);

As suggested by another thread I came across, I've added /etc/hbase/conf/
to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted
services, but no improvement. Here is the full classpath:

/usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*

Any thoughts on what the problem could be?


Hbase ports

2013-09-23 Thread John Foxinhead
Hi all. I'm doing a project for my university so that i have to know
perfectly how all the Hbase ports work. Studing the documentation i found
that Zookeeper accept connection on port 2181, Hbase master on port 6
and Hbase regionservers on port 60020. I didn't understand the importance
of port 60010 on master and port 60030 on regionservers. Can i not use them?
More important: if i launch Hbase in pseudo-distribuited mode, running all
processes on localhost, what ports are used for each of the processes if i
launch 1, 2, 3 or more backup masters and if i launch few regionservers
(less than 10) or a lot of regionservers (10, 20, 100)?


Re: Loading hbase-site.xml settings from Hadoop MR job

2013-09-23 Thread Renato Marroquín Mogrovejo
Maybe you should putting this configurations within your class path, so it
can be reached from your clients env.


2013/9/23 Shahab Yunus shahab.yu...@gmail.com

 From where are you running your job? From which machine? This client
 machine from where you are kicking of this job should have the
 hbase-site.xml with the correct ZK info in it. It seems that your
 client/job is having and issue picking up the right ZK, rather than the
 services running on your non-local cluster.

 Regards,
 Shahab


 On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.com
 wrote:

  I'm having an issue where my Hadoop MR job for bulk loading data into
 Hbase
  is not reading my hbase-site.xml file -- thus it tries to connect to
  Zookeeper on localhost.  This is on a cluster using CDH4 on Ubuntu 12.04.
 
  Here's the code where it attempts to connect to local zookeeper:
  Configuration conf = new Configuration(); // from
  org.apache.hadoop.conf
  Job job = new Job(conf);
  HTable hTable = new HTable(conf, tableName);
  HFileOutputFormat.configureIncrementalLoad(job, hTable);
 
  As suggested by another thread I came across, I've added
 /etc/hbase/conf/
  to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted
  services, but no improvement. Here is the full classpath:
 
 
 
 /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
 
  Any thoughts on what the problem could be?
 



Re: Loading hbase-site.xml settings from Hadoop MR job

2013-09-23 Thread Shahab Yunus
From where are you running your job? From which machine? This client
machine from where you are kicking of this job should have the
hbase-site.xml with the correct ZK info in it. It seems that your
client/job is having and issue picking up the right ZK, rather than the
services running on your non-local cluster.

Regards,
Shahab


On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.comwrote:

 I'm having an issue where my Hadoop MR job for bulk loading data into Hbase
 is not reading my hbase-site.xml file -- thus it tries to connect to
 Zookeeper on localhost.  This is on a cluster using CDH4 on Ubuntu 12.04.

 Here's the code where it attempts to connect to local zookeeper:
 Configuration conf = new Configuration(); // from
 org.apache.hadoop.conf
 Job job = new Job(conf);
 HTable hTable = new HTable(conf, tableName);
 HFileOutputFormat.configureIncrementalLoad(job, hTable);

 As suggested by another thread I came across, I've added /etc/hbase/conf/
 to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted
 services, but no improvement. Here is the full classpath:


 /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*

 Any thoughts on what the problem could be?



Re: Co-Processors in Hase 0.95.2 version

2013-09-23 Thread Kiru Pakkirisamy
Yeshwanth,
With 0.95.2 coprocessors use protocol instead of serializing Java objects, one 
would use protobuf.
Most of the tutorials on protobuf are mainly about the data structures and not 
much about the rpc mechanism, which is what we use in 0.95.2

The RPC is defined like this in a proto file -

service TermIdSearchService {
rpc getTermIdWithCount(TermList)
returns (TermIdCountList);
}

Then it is implemented like this (this is also what to install as coprocessor 
in the table)


public class TermIdSearchEndpointV2 extends
TermIdSearchProtocol.TermIdSearchService implements Coprocessor,
CoprocessorService  {
.

@Override
public void getTermIdWithCount(RpcController controller,
TermIdSearchProtocol.TermList termlist,
RpcCallbackTermIdSearchProtocol.TermIdCountList callback) {

}


}


Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com



 From: yeshwanth kumar yeshwant...@gmail.com
To: user@hbase.apache.org 
Sent: Sunday, September 22, 2013 10:14 PM
Subject: Co-Processors in Hase 0.95.2 version
 

Hi,

facing some difficulty to write the co-processors in hbase 0.95.2 version,
looking for some tutorials and examples

can anyone provide me some examples


how the co-processors are related with protobuffer's ..

Thanks

Re: Loading hbase-site.xml settings from Hadoop MR job

2013-09-23 Thread Dolan Antenucci
Hi Shahab,
The Hadoop MR job is being started from one of the slaves on the cluster
(i.e. where a HBase RegionServer is running, along with a TaskTracker). The
hbase-site.xml file on this machine points to the correct ZK quorum.

One interesting thing I've noticed is that the right ZK server is used for
the following code (that is in the same MR job, just a few lines earlier):
HBaseAdmin admin = new HBaseAdmin(conf);
admin.disableTable(tableName);
admin.deleteTable(tableName);

Could the issue be with HTable or HFileOutputFormat (i.e. maybe they have a
config or config-path hard-coded)?




On Mon, Sep 23, 2013 at 12:53 PM, Shahab Yunus shahab.yu...@gmail.comwrote:

 From where are you running your job? From which machine? This client
 machine from where you are kicking of this job should have the
 hbase-site.xml with the correct ZK info in it. It seems that your
 client/job is having and issue picking up the right ZK, rather than the
 services running on your non-local cluster.

 Regards,
 Shahab


 On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.com
 wrote:

  I'm having an issue where my Hadoop MR job for bulk loading data into
 Hbase
  is not reading my hbase-site.xml file -- thus it tries to connect to
  Zookeeper on localhost.  This is on a cluster using CDH4 on Ubuntu 12.04.
 
  Here's the code where it attempts to connect to local zookeeper:
  Configuration conf = new Configuration(); // from
  org.apache.hadoop.conf
  Job job = new Job(conf);
  HTable hTable = new HTable(conf, tableName);
  HFileOutputFormat.configureIncrementalLoad(job, hTable);
 
  As suggested by another thread I came across, I've added
 /etc/hbase/conf/
  to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted
  services, but no improvement. Here is the full classpath:
 
 
 
 /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
 
  Any thoughts on what the problem could be?
 



Re: Loading hbase-site.xml settings from Hadoop MR job

2013-09-23 Thread Dolan Antenucci
Hi Renato,

Can you clarify your recommendation?  Currently I've added the directory
where my hbase-site.xml file lives (/etc/hbase/conf/) to my Hadoop
classpath (as described above). Note: from the client machine (where I'm
starting my MR job), I generated the above class path by running hadoop
classpath.  Also worth noting that the /etc/hbase/conf/hbase-site.xml file
on this client machine points to the correct ZK quorum.

Thanks


On Mon, Sep 23, 2013 at 1:06 PM, Renato Marroquín Mogrovejo 
renatoj.marroq...@gmail.com wrote:

 Maybe you should putting this configurations within your class path, so it
 can be reached from your clients env.


 2013/9/23 Shahab Yunus shahab.yu...@gmail.com

  From where are you running your job? From which machine? This client
  machine from where you are kicking of this job should have the
  hbase-site.xml with the correct ZK info in it. It seems that your
  client/job is having and issue picking up the right ZK, rather than the
  services running on your non-local cluster.
 
  Regards,
  Shahab
 
 
  On Mon, Sep 23, 2013 at 12:09 PM, Dolan Antenucci antenucc...@gmail.com
  wrote:
 
   I'm having an issue where my Hadoop MR job for bulk loading data into
  Hbase
   is not reading my hbase-site.xml file -- thus it tries to connect to
   Zookeeper on localhost.  This is on a cluster using CDH4 on Ubuntu
 12.04.
  
   Here's the code where it attempts to connect to local zookeeper:
   Configuration conf = new Configuration(); // from
   org.apache.hadoop.conf
   Job job = new Job(conf);
   HTable hTable = new HTable(conf, tableName);
   HFileOutputFormat.configureIncrementalLoad(job, hTable);
  
   As suggested by another thread I came across, I've added
  /etc/hbase/conf/
   to my HADOOP_CLASSPATH (in /etc/hadoop/conf/hadoop-env.sh), restarted
   services, but no improvement. Here is the full classpath:
  
  
  
 
 /usr/local/hadoop/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar:/etc/hbase/conf/::/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
  
   Any thoughts on what the problem could be?
  
 



Re: Hbase ports

2013-09-23 Thread Jean-Daniel Cryans
On Mon, Sep 23, 2013 at 9:14 AM, John Foxinhead john.foxinh...@gmail.comwrote:

 Hi all. I'm doing a project for my university so that i have to know
 perfectly how all the Hbase ports work. Studing the documentation i found
 that Zookeeper accept connection on port 2181, Hbase master on port 6
 and Hbase regionservers on port 60020. I didn't understand the importance
 of port 60010 on master and port 60030 on regionservers. Can i not use
 them?


From the documentation (http://hbase.apache.org/book.html#config.files):

hbase.regionserver.info.port

The port for the HBase RegionServer web UI Set to -1 if you do not want the
RegionServer UI to run.

Default: 60030
You can look for the other port in there too.


 More important: if i launch Hbase in pseudo-distribuited mode, running all
 processes on localhost, what ports are used for each of the processes if i
 launch 1, 2, 3 or more backup masters and if i launch few regionservers
 (less than 10) or a lot of regionservers (10, 20, 100)?


 It'll clash, you'll have to have different hbase-site.xml for each process
you want to start.

J-D


HBase consultant to help with scaling

2013-09-23 Thread Omar Baba
We have an in-house Hadoop/HBase cluster that we're trying to use for
storing application log data and we're running into issues writing to it at
high throughput rates upwards of 15K writes per second.  We've changed many
configurations and did multiple rounds of performance tuning, but we are
always hitting the same throughput limit.  What is the best way of finding
and hiring a professional HBase consultant that can help?

-Omar Baba


Re: HBase consultant to help with scaling

2013-09-23 Thread Hubbert Smith
zaloni - Ben Sharma - b...@zaloni.com
thirdeye - DJ Das

   - debjyoti@gmail.com

Phone

   - 408-431-1487 (Mobile)



On Mon, Sep 23, 2013 at 11:56 AM, Omar Baba omar.b...@gmail.com wrote:

 We have an in-house Hadoop/HBase cluster that we're trying to use for
 storing application log data and we're running into issues writing to it at
 high throughput rates upwards of 15K writes per second.  We've changed many
 configurations and did multiple rounds of performance tuning, but we are
 always hitting the same throughput limit.  What is the best way of finding
 and hiring a professional HBase consultant that can help?

 -Omar Baba




-- 
hubb...@hubbertsmith.com  |  385 315 0198 -new mobile #
Data Center Storage - my latest book on Amazon http://tinyurl.com/8x29cns
| LinkedIN http://tinyurl.com/7v5eu2p


Write TimeSeries Data and Do Time Based Range Scans

2013-09-23 Thread anil gupta
Hi All,

I have a secondary index(inverted index) table with a rowkey on the basis
of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch.
I also store some extra(apart from main_table rowkey) columns in that table
for doing filtering.

The requirement is to do range-based scan on the basis of time of
event.  Hence, the index with this rowkey.
I cannot use Hashing or MD5 digest solution because then i cannot do range
based scans.  And, i already have a index like OpenTSDB in another table
for the same dataset.(I have many secondary Index for same data set.)

Problem: When we increase the write workload during stress test. Time
secondary index becomes a bottleneck due to the famous Region HotSpotting
problem.
Solution: I am thinking of adding a prefix of { (TimeStamp in Epoch%10) =
bucket}  in the rowkey. Then my row key will become:
 BucketTimeStamp in Epoch
By using above rowkey i can at least alleviate *WRITE* problem.(i don't
think problem can be fixed permanently because of the use case requirement.
I would love to be proven wrong.)
However, with the above row key, now when i want to *READ* data, for every
single range scans i have to read data from 10 different regions. This
extra load for read is scaring me a bit.

I am wondering if anyone has better suggestion/approach to solve this
problem given the constraints i have.  Looking for feedback from community.

-- 
Thanks  Regards,
Anil Gupta


Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-23 Thread Shahab Yunus
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

Here you can find the discussion, trade-offs and working code/API (even for
M/R) about this and the approach you are trying out.

Regards,
Shahab


On Mon, Sep 23, 2013 at 5:41 PM, anil gupta anilgupt...@gmail.com wrote:

 Hi All,

 I have a secondary index(inverted index) table with a rowkey on the basis
 of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch.
 I also store some extra(apart from main_table rowkey) columns in that table
 for doing filtering.

 The requirement is to do range-based scan on the basis of time of
 event.  Hence, the index with this rowkey.
 I cannot use Hashing or MD5 digest solution because then i cannot do range
 based scans.  And, i already have a index like OpenTSDB in another table
 for the same dataset.(I have many secondary Index for same data set.)

 Problem: When we increase the write workload during stress test. Time
 secondary index becomes a bottleneck due to the famous Region HotSpotting
 problem.
 Solution: I am thinking of adding a prefix of { (TimeStamp in Epoch%10) =
 bucket}  in the rowkey. Then my row key will become:
  BucketTimeStamp in Epoch
 By using above rowkey i can at least alleviate *WRITE* problem.(i don't
 think problem can be fixed permanently because of the use case requirement.
 I would love to be proven wrong.)
 However, with the above row key, now when i want to *READ* data, for every
 single range scans i have to read data from 10 different regions. This
 extra load for read is scaring me a bit.

 I am wondering if anyone has better suggestion/approach to solve this
 problem given the constraints i have.  Looking for feedback from community.

 --
 Thanks  Regards,
 Anil Gupta



Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-23 Thread anil gupta
Hi Shahab,

If you read my solution carefully. I am already doing that.

Thanks,
Anil Gupta


On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus shahab.yu...@gmail.comwrote:


 http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

 Here you can find the discussion, trade-offs and working code/API (even for
 M/R) about this and the approach you are trying out.

 Regards,
 Shahab


 On Mon, Sep 23, 2013 at 5:41 PM, anil gupta anilgupt...@gmail.com wrote:

  Hi All,
 
  I have a secondary index(inverted index) table with a rowkey on the basis
  of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch.
  I also store some extra(apart from main_table rowkey) columns in that
 table
  for doing filtering.
 
  The requirement is to do range-based scan on the basis of time of
  event.  Hence, the index with this rowkey.
  I cannot use Hashing or MD5 digest solution because then i cannot do
 range
  based scans.  And, i already have a index like OpenTSDB in another table
  for the same dataset.(I have many secondary Index for same data set.)
 
  Problem: When we increase the write workload during stress test. Time
  secondary index becomes a bottleneck due to the famous Region HotSpotting
  problem.
  Solution: I am thinking of adding a prefix of { (TimeStamp in
 Epoch%10) =
  bucket}  in the rowkey. Then my row key will become:
   BucketTimeStamp in Epoch
  By using above rowkey i can at least alleviate *WRITE* problem.(i don't
  think problem can be fixed permanently because of the use case
 requirement.
  I would love to be proven wrong.)
  However, with the above row key, now when i want to *READ* data, for
 every
  single range scans i have to read data from 10 different regions. This
  extra load for read is scaring me a bit.
 
  I am wondering if anyone has better suggestion/approach to solve this
  problem given the constraints i have.  Looking for feedback from
 community.
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta


Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-23 Thread Shahab Yunus
Yeah, I saw that. In fact that is why I recommended that to you as I
couldn't infer from your email that whether you have already gone through
that source or not. A source, who did the exact same thing and discuss it
in much more detail and concerns aligning with yours (in fact I think some
of the authors/creators of that link/group are members here of this
community as well.)

Regards,
Shahab


On Mon, Sep 23, 2013 at 8:41 PM, anil gupta anilgupt...@gmail.com wrote:

 Hi Shahab,

 If you read my solution carefully. I am already doing that.

 Thanks,
 Anil Gupta


 On Mon, Sep 23, 2013 at 3:51 PM, Shahab Yunus shahab.yu...@gmail.com
 wrote:

 
 
 http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
 
  Here you can find the discussion, trade-offs and working code/API (even
 for
  M/R) about this and the approach you are trying out.
 
  Regards,
  Shahab
 
 
  On Mon, Sep 23, 2013 at 5:41 PM, anil gupta anilgupt...@gmail.com
 wrote:
 
   Hi All,
  
   I have a secondary index(inverted index) table with a rowkey on the
 basis
   of Timestamp of an event. Assume the rowkey as TimeStamp in Epoch.
   I also store some extra(apart from main_table rowkey) columns in that
  table
   for doing filtering.
  
   The requirement is to do range-based scan on the basis of time of
   event.  Hence, the index with this rowkey.
   I cannot use Hashing or MD5 digest solution because then i cannot do
  range
   based scans.  And, i already have a index like OpenTSDB in another
 table
   for the same dataset.(I have many secondary Index for same data set.)
  
   Problem: When we increase the write workload during stress test. Time
   secondary index becomes a bottleneck due to the famous Region
 HotSpotting
   problem.
   Solution: I am thinking of adding a prefix of { (TimeStamp in
  Epoch%10) =
   bucket}  in the rowkey. Then my row key will become:
BucketTimeStamp in Epoch
   By using above rowkey i can at least alleviate *WRITE* problem.(i don't
   think problem can be fixed permanently because of the use case
  requirement.
   I would love to be proven wrong.)
   However, with the above row key, now when i want to *READ* data, for
  every
   single range scans i have to read data from 10 different regions. This
   extra load for read is scaring me a bit.
  
   I am wondering if anyone has better suggestion/approach to solve this
   problem given the constraints i have.  Looking for feedback from
  community.
  
   --
   Thanks  Regards,
   Anil Gupta
  
 



 --
 Thanks  Regards,
 Anil Gupta



Re: quieting HBase metrics

2013-09-23 Thread Otis Gospodnetic
Me, too!

Otis
--
HBase Performance Monitoring -- http://sematext.com/spm



On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com wrote:
 Hi All,


 I have the same query. Tried searching for a solution on the web but found
 nothing very helpful. Is there some way I could disable metrics for every
 table in HBase from being sent to Ganglia? Or just any information on how
 to configure HBase to emit fewer metrics?


 Thanks,
 Arati Patro


 On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.comwrote:

 In http://hbase.apache.org/book/hbase_metrics.html we see

 
 15.4.2. Warning To Ganglia Users

 Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per
 RegionServer which may swamp your installation. Options include either
 increasing Ganglia server capacity, or configuring HBase to emit fewer
 metrics.
 

 However, although there is documentation to help set up Ganglia metrics for
 HBase, there doesn't seem to be any that shows how to make HBase emit fewer
 metrics.  Could someone lend me a hand?  I have increased the period in
 hadoop-metrics.properties but that has proven insufficient.

 thanks
 rone



Re: quieting HBase metrics

2013-09-23 Thread Bing Jiang
hbase metrics contains different types:RegionServerMetrics and
RegionServerDynamicMetrics.
And there are more metrics emitted from RegionServerDynamicMetrics,
especially in case that there are more regions in RS.
RegionServerMetrics and RSDM share one metrics context, which makes the
control become difficult.

In our production, we add a metrics filter in MetricsContext, which can be
configured as to our needs, which will reduce the amount of metrics, and
take a good effect.




2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com

 Me, too!

 Otis
 --
 HBase Performance Monitoring -- http://sematext.com/spm



 On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com
 wrote:
  Hi All,
 
 
  I have the same query. Tried searching for a solution on the web but
 found
  nothing very helpful. Is there some way I could disable metrics for every
  table in HBase from being sent to Ganglia? Or just any information on how
  to configure HBase to emit fewer metrics?
 
 
  Thanks,
  Arati Patro
 
 
  On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com
 wrote:
 
  In http://hbase.apache.org/book/hbase_metrics.html we see
 
  
  15.4.2. Warning To Ganglia Users
 
  Warning to Ganglia Users: by default, HBase will emit a LOT of metrics
 per
  RegionServer which may swamp your installation. Options include either
  increasing Ganglia server capacity, or configuring HBase to emit fewer
  metrics.
  
 
  However, although there is documentation to help set up Ganglia metrics
 for
  HBase, there doesn't seem to be any that shows how to make HBase emit
 fewer
  metrics.  Could someone lend me a hand?  I have increased the period in
  hadoop-metrics.properties but that has proven insufficient.
 
  thanks
  rone
 




-- 
Bing Jiang
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase


2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com

 Me, too!

 Otis
 --
 HBase Performance Monitoring -- http://sematext.com/spm



 On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com
 wrote:
  Hi All,
 
 
  I have the same query. Tried searching for a solution on the web but
 found
  nothing very helpful. Is there some way I could disable metrics for every
  table in HBase from being sent to Ganglia? Or just any information on how
  to configure HBase to emit fewer metrics?
 
 
  Thanks,
  Arati Patro
 
 
  On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com
 wrote:
 
  In http://hbase.apache.org/book/hbase_metrics.html we see
 
  
  15.4.2. Warning To Ganglia Users
 
  Warning to Ganglia Users: by default, HBase will emit a LOT of metrics
 per
  RegionServer which may swamp your installation. Options include either
  increasing Ganglia server capacity, or configuring HBase to emit fewer
  metrics.
  
 
  However, although there is documentation to help set up Ganglia metrics
 for
  HBase, there doesn't seem to be any that shows how to make HBase emit
 fewer
  metrics.  Could someone lend me a hand?  I have increased the period in
  hadoop-metrics.properties but that has proven insufficient.
 
  thanks
  rone
 




-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase


Re: quieting HBase metrics

2013-09-23 Thread Ron Echeverri
Thanks, Bing, but i still don't understand *how* to configure that.  Could
you share your hadoop-metrics.properties file, or any other files you use
for this?

rone


On Mon, Sep 23, 2013 at 8:29 PM, Bing Jiang jiangbinglo...@gmail.comwrote:

 hbase metrics contains different types:RegionServerMetrics and
 RegionServerDynamicMetrics.
 And there are more metrics emitted from RegionServerDynamicMetrics,
 especially in case that there are more regions in RS.
 RegionServerMetrics and RSDM share one metrics context, which makes the
 control become difficult.

 In our production, we add a metrics filter in MetricsContext, which can be
 configured as to our needs, which will reduce the amount of metrics, and
 take a good effect.




 2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com

 Me, too!

 Otis
 --
 HBase Performance Monitoring -- http://sematext.com/spm



 On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com
 wrote:
  Hi All,
 
 
  I have the same query. Tried searching for a solution on the web but
 found
  nothing very helpful. Is there some way I could disable metrics for
 every
  table in HBase from being sent to Ganglia? Or just any information on
 how
  to configure HBase to emit fewer metrics?
 
 
  Thanks,
  Arati Patro
 
 
  On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com
 wrote:
 
  In http://hbase.apache.org/book/hbase_metrics.html we see
 
  
  15.4.2. Warning To Ganglia Users
 
  Warning to Ganglia Users: by default, HBase will emit a LOT of metrics
 per
  RegionServer which may swamp your installation. Options include either
  increasing Ganglia server capacity, or configuring HBase to emit fewer
  metrics.
  
 
  However, although there is documentation to help set up Ganglia
 metrics for
  HBase, there doesn't seem to be any that shows how to make HBase emit
 fewer
  metrics.  Could someone lend me a hand?  I have increased the period in
  hadoop-metrics.properties but that has proven insufficient.
 
  thanks
  rone
 




 --
 Bing Jiang
 weibo: http://weibo.com/jiangbinglover
 BLOG: www.binospace.com
 BLOG: http://blog.sina.com.cn/jiangbinglover
 Focus on distributed computing, HDFS/HBase


 2013/9/24 Otis Gospodnetic otis.gospodne...@gmail.com

 Me, too!

 Otis
 --
 HBase Performance Monitoring -- http://sematext.com/spm



 On Mon, Sep 23, 2013 at 7:10 AM, Arati Patro arati.pa...@gmail.com
 wrote:
  Hi All,
 
 
  I have the same query. Tried searching for a solution on the web but
 found
  nothing very helpful. Is there some way I could disable metrics for
 every
  table in HBase from being sent to Ganglia? Or just any information on
 how
  to configure HBase to emit fewer metrics?
 
 
  Thanks,
  Arati Patro
 
 
  On Thu, Sep 19, 2013 at 4:47 AM, Ron Echeverri recheve...@maprtech.com
 wrote:
 
  In http://hbase.apache.org/book/hbase_metrics.html we see
 
  
  15.4.2. Warning To Ganglia Users
 
  Warning to Ganglia Users: by default, HBase will emit a LOT of metrics
 per
  RegionServer which may swamp your installation. Options include either
  increasing Ganglia server capacity, or configuring HBase to emit fewer
  metrics.
  
 
  However, although there is documentation to help set up Ganglia
 metrics for
  HBase, there doesn't seem to be any that shows how to make HBase emit
 fewer
  metrics.  Could someone lend me a hand?  I have increased the period in
  hadoop-metrics.properties but that has proven insufficient.
 
  thanks
  rone
 




 --
 Bing Jiang
 Tel:(86)134-2619-1361
 weibo: http://weibo.com/jiangbinglover
 BLOG: www.binospace.com
 BLOG: http://blog.sina.com.cn/jiangbinglover
 Focus on distributed computing, HDFS/HBase