Re: Change data capture tool for hbase

2013-06-04 Thread yavuz gokirmak
Hi Asaf,

This CDC pattern will be used for directing changes to another system,
Assume I have a table hbase_alarms in hbase with columns
Severity,Source,Time and tracking changes with this CDC tool.  Some
external system is putting alarms with their severity and source to
hbase_alarms table .

Now I have a source system and I need to take some action tracking changes.
For example one example may be inserting some critical alarms to another
table in rdms database as well. So using such kind of CDC tool, I can write
rules like that if severity=critical and source=router insert record to
psql_alarms .


This is just an example, as I wrote I am planning implement this tool as
flume source so I can take any action on any system using flume sinks. (
calling a webservice, doing an http request, writing to file etc... )

In RDMS world CDC pattern works like an triggering mechanism but it is much
more efficient than triggers (cdc tools extracts change information from
logs asynchronously therefore they do lengthen transaction ).

regards..



On 4 June 2013 06:57, Asaf Mesika asaf.mes...@gmail.com wrote:

 What's wrong with HBase native Master Slave replicate, or am I missing
 something here?


 On Mon, Jun 3, 2013 at 12:16 PM, yavuz gokirmak ygokir...@gmail.com
 wrote:

  Hi all,
 
  Currently we are working on a hbase change data capture (CDC) tool. I
 want
  to share our ideas and continue development according to your feedback.
 
  As you know CDC tools are used for tracking the data changes and take
  actions according to these changes[1].  For example in relational
  databases, CDC tools are mainly used for replication. You can replicate
  your source system continuously to another location or db using CDC
 tool.So
  whenever an insert/update/delete is done on the source system, you can
  reflect the same operation to the replicated environment.
 
  As I've said, we are working on a CDC tool that can track changes on a
  hbase table and reflect those changes to any other system in real-time.
 
  What we are trying to implement the tool in a way that he will behave as
 a
  slave cluster. So if we enable master-master replication in the source
  system, we expect to get all changes and act accordingly. Once the proof
 of
  concept cdc tool is implemented ( we need one week ) we will convert it
 to
  a flume source. So using it as a flume source we can direct data changes
 to
  any destination (sink)
 
  This is just a summary.
  Please write your feedback and comments.
 
  Do you know any tool similar to this proposal?
 
  regards.
 
 
 
 
 
  1- http://en.wikipedia.org/wiki/Change_data_capture
 



RPC Replication Compression

2013-06-04 Thread Asaf Mesika
Hi,

Just wanted to make sure if I read in the internet correctly: 0.96 will
support HBase RPC compression thus Replication between master and slave
will enjoy it as well (important since bandwidth between geographically
distant data centers is scarce and more expensive)


Re: RPC Replication Compression

2013-06-04 Thread Anoop John
 0.96 will support HBase RPC compression
Yes

 Replication between master and slave
will enjoy it as well (important since bandwidth between geographically
distant data centers is scarce and more expensive)

But I can not see it is being utilized in replication. May be we can do
improvements in this area. I can see possibilities.

-Anoop-


On Tue, Jun 4, 2013 at 1:51 PM, Asaf Mesika asaf.mes...@gmail.com wrote:

 Hi,

 Just wanted to make sure if I read in the internet correctly: 0.96 will
 support HBase RPC compression thus Replication between master and slave
 will enjoy it as well (important since bandwidth between geographically
 distant data centers is scarce and more expensive)



Re: what's the typical scan latency?

2013-06-04 Thread Amit Mor
What's your blockCacheHitCachingRatio ? It would tell you about the ratio
of scans requested from cache (default) to the scans actually served from
the block cache. You can get that from the RS web ui. What you are seeing
can almost map to anything, for example: is scanner caching (client side)
enabled ? if so, how many rows are cached (how many rows returned by the
scanner.next RPC call) ? what's your HFile block size, block cache % of
total RS heap, max number of RPCs per RS for client connections,
tcpnodelay, your network topology and jitter, number of NICs. Are you using
HTableInterface connection pool ? HBase client is synchronous, so how do
achieve concurrency ?  What about your percentiles ? is 5ms the mean ?
median ? is 20ms only in the 99% percentile, etc. etc. etc ... I am far
from considering my self an expert on the general topic of HBase, so take
my tips with a pinch of salt - these are just factors I've considered when
trying to optimize my read latency. Hope that helps.


On Tue, Jun 4, 2013 at 4:02 AM, Liu, Raymond raymond@intel.com wrote:

 Thanks Amit

 In my envionment, I run a dozens of client to read about 5-20K data per
 scan concurrently, And the average read latency for cached data is around
 5-20ms.
 So it seems there must be something wrong with my cluster env or
 application. Or did you run that with multiple client?


 Depends on so much environment related variables and on data as well.
 But to give you a number after all:
 One of our clusters is on EC2, 6 RS, on m1.xlarge machines (network
 performance 'high' according to aws), with 90% of the time we do reads; our
 avg data size is 2K, block cache at 20K, 100 rows per scan avg, bloom
 filters 'on' at the 'ROW' level, 40% of heap dedicated to block cache (note
 that it contains several other bits and pieces) and I would say our average
 latency for cached data (~97% blockCacheHitCachingRatio) is 3-4ms. File
 system access is much much painful, especially on ec2 m1.xlarge where you
 really can't tell what's going on, as far as I can tell. To tell you the
 truth as I see it, this is an abuse (for our use case) of the HBase store
 and for cache like behavior I would recommend going to something like Redis.


 On Mon, Jun 3, 2013 at 12:13 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:

  What is that you are observing now?
 
  Regards
  Ram
 
 
  On Mon, Jun 3, 2013 at 2:00 PM, Liu, Raymond raymond@intel.com
  wrote:
 
   Hi
  
   If all the data is already in RS blockcache.
   Then what's the typical scan latency for scan a few rows
   from a say several GB table ( with dozens of regions ) on a small
   cluster with
  say
   4 RS ?
  
   A few ms? Tens of ms? Or more?
  
   Best Regards,
   Raymond Liu
  
 



Re: RPC Replication Compression

2013-06-04 Thread Asaf Mesika
If RPC has compression abilities, how come Replication, which also works in
RPC does not get it automatically?


On Tue, Jun 4, 2013 at 12:34 PM, Anoop John anoop.hb...@gmail.com wrote:

  0.96 will support HBase RPC compression
 Yes

  Replication between master and slave
 will enjoy it as well (important since bandwidth between geographically
 distant data centers is scarce and more expensive)

 But I can not see it is being utilized in replication. May be we can do
 improvements in this area. I can see possibilities.

 -Anoop-


 On Tue, Jun 4, 2013 at 1:51 PM, Asaf Mesika asaf.mes...@gmail.com wrote:

  Hi,
 
  Just wanted to make sure if I read in the internet correctly: 0.96 will
  support HBase RPC compression thus Replication between master and slave
  will enjoy it as well (important since bandwidth between geographically
  distant data centers is scarce and more expensive)
 



Using thrift2 interface but getting : 400 Bad Request

2013-06-04 Thread Simon Majou
Hello,

I am using thrift  thrift2 interfaces (thrift for DDL  thrift2 for the
rest), my requests work with thrift but with thrift2 I got a error 400.

Here is my code (coffeescript) :

  colValue = new types2.TColumnValue family: 'cf', qualifier:'col',
value:'yoo'
  put = new types2.TPut(row:'row1', columnValues: [ colValue ])
  client2.put 'test', put, (err, res) -
console.log 'put', err, res


Here is what is sent by the put method :

{ row: 'row1',
  columnValues: [ { family: 'cf', qualifier: 'col', value: 'yoo',
timestamp: null } ],
  timestamp: null,
  writeToWal: true }


And here is the reply from thrift2 deamon :

receive HTTP/1.1 400 Bad Request
Connection: close
Server: Jetty(6.1.26)


There are no logs into thrift2.log when I do my request.

Anyone have any clue ?

Simon


Re: Using thrift2 interface but getting : 400 Bad Request

2013-06-04 Thread Ted Yu
Can you check region server log around that time ?

Thanks

On Jun 4, 2013, at 8:37 AM, Simon Majou si...@majou.org wrote:

 Hello,
 
 I am using thrift  thrift2 interfaces (thrift for DDL  thrift2 for the
 rest), my requests work with thrift but with thrift2 I got a error 400.
 
 Here is my code (coffeescript) :
 
  colValue = new types2.TColumnValue family: 'cf', qualifier:'col',
 value:'yoo'
  put = new types2.TPut(row:'row1', columnValues: [ colValue ])
  client2.put 'test', put, (err, res) -
console.log 'put', err, res
 
 
 Here is what is sent by the put method :
 
 { row: 'row1',
  columnValues: [ { family: 'cf', qualifier: 'col', value: 'yoo',
 timestamp: null } ],
  timestamp: null,
  writeToWal: true }
 
 
 And here is the reply from thrift2 deamon :
 
 receive HTTP/1.1 400 Bad Request
 Connection: close
 Server: Jetty(6.1.26)
 
 
 There are no logs into thrift2.log when I do my request.
 
 Anyone have any clue ?
 
 Simon


Re: Using thrift2 interface but getting : 400 Bad Request

2013-06-04 Thread Simon Majou
No logs there either (in fact no logs are written in any log file when I
execute the request)


Simon


On Tue, Jun 4, 2013 at 5:42 PM, Ted Yu yuzhih...@gmail.com wrote:

 Can you check region server log around that time ?

 Thanks

 On Jun 4, 2013, at 8:37 AM, Simon Majou si...@majou.org wrote:

  Hello,
 
  I am using thrift  thrift2 interfaces (thrift for DDL  thrift2 for the
  rest), my requests work with thrift but with thrift2 I got a error 400.
 
  Here is my code (coffeescript) :
 
   colValue = new types2.TColumnValue family: 'cf', qualifier:'col',
  value:'yoo'
   put = new types2.TPut(row:'row1', columnValues: [ colValue ])
   client2.put 'test', put, (err, res) -
 console.log 'put', err, res
 
 
  Here is what is sent by the put method :
 
  { row: 'row1',
   columnValues: [ { family: 'cf', qualifier: 'col', value: 'yoo',
  timestamp: null } ],
   timestamp: null,
   writeToWal: true }
 
 
  And here is the reply from thrift2 deamon :
 
  receive HTTP/1.1 400 Bad Request
  Connection: close
  Server: Jetty(6.1.26)
 
 
  There are no logs into thrift2.log when I do my request.
 
  Anyone have any clue ?
 
  Simon



Regarding Indexing columns in HBASE

2013-06-04 Thread Ramasubramanian Narayanan
Hi,

In a HBASE table, there are 200 columns and the read pattern for diffferent
systems invols 70 columns...
In the above case, we cannot have 70 columns in the rowkey which will not
be a good design...

Can you please suggest how to handle this problem?
Also can we do indexing in HBASE apart from rowkey? (something called
secondary index)

regards,
Rams


Re: RPC Replication Compression

2013-06-04 Thread Jean-Daniel Cryans
Replication doesn't need to know about compression at the RPC level so
it won't refer to it and as far as I can tell you need to set
compression only on the master cluster and the slave will figure it
out.

Looking at the code tho, I'm not sure it works the same way it used to
work before everything went protobuf. I would give 2 internets to
whoever tests 0.95.1 with RPC compression turned on and compares
results with non-compressed RPC. See
http://hbase.apache.org/book.html#rpc.configs

J-D

On Tue, Jun 4, 2013 at 5:22 AM, Asaf Mesika asaf.mes...@gmail.com wrote:
 If RPC has compression abilities, how come Replication, which also works in
 RPC does not get it automatically?


 On Tue, Jun 4, 2013 at 12:34 PM, Anoop John anoop.hb...@gmail.com wrote:

  0.96 will support HBase RPC compression
 Yes

  Replication between master and slave
 will enjoy it as well (important since bandwidth between geographically
 distant data centers is scarce and more expensive)

 But I can not see it is being utilized in replication. May be we can do
 improvements in this area. I can see possibilities.

 -Anoop-


 On Tue, Jun 4, 2013 at 1:51 PM, Asaf Mesika asaf.mes...@gmail.com wrote:

  Hi,
 
  Just wanted to make sure if I read in the internet correctly: 0.96 will
  support HBase RPC compression thus Replication between master and slave
  will enjoy it as well (important since bandwidth between geographically
  distant data centers is scarce and more expensive)
 



Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Shahab Yunus
Just a quick thought, why don't you create different tables and duplicate
data i.e. go for demoralization and data redundancy. Is your all read
access patterns that would require 70 columns are incorporated into one
application/client? Or it will be bunch of different clients/applications?
If that is not the case then I think why not take advantage of more storage.

Regards,
Shahab


On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan 
ramasubramanian.naraya...@gmail.com wrote:

 Hi,

 In a HBASE table, there are 200 columns and the read pattern for diffferent
 systems invols 70 columns...
 In the above case, we cannot have 70 columns in the rowkey which will not
 be a good design...

 Can you please suggest how to handle this problem?
 Also can we do indexing in HBASE apart from rowkey? (something called
 secondary index)

 regards,
 Rams



Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Bryan Keller
Thanks Enis, I'll see if I can backport this patch - it is exactly what I was 
going to try. This should solve my scan performance problems if I can get it to 
work.

On May 29, 2013, at 1:29 PM, Enis Söztutar e...@hortonworks.com wrote:

 Hi,
 
 Regarding running raw scans on top of Hfiles, you can try a version of the
 patch attached at https://issues.apache.org/jira/browse/HBASE-8369, which
 enables exactly this. However, the patch is for trunk.
 
 In that, we open one region from snapshot files in each record reader, and
 run a scan through using an internal region scanner. Since this bypasses
 the client + rpc + server daemon layers, it should be able to give optimum
 scan performance.
 
 There is also a tool called HFilePerformanceBenchmark that intends to
 measure raw performance for HFiles. I've had to do a lot of changes to make
 is workable, but it might be worth to take a look to see whether there is
 any perf difference between scanning a sequence file from hdfs vs scanning
 an hfile.
 
 Enis
 
 
 On Fri, May 24, 2013 at 10:50 PM, lars hofhansl la...@apache.org wrote:
 
 Sorry. Haven't gotten to this, yet.
 
 Scanning in HBase being about 3x slower than straight HDFS is in the right
 ballpark, though. It has to a bit more work.
 
 Generally, HBase is great at honing in to a subset (some 10-100m rows) of
 the data. Raw scan performance is not (yet) a strength of HBase.
 
 So with HDFS you get to 75% of the theoretical maximum read throughput;
 hence with HBase you to 25% of the theoretical cluster wide maximum disk
 throughput?
 
 
 -- Lars
 
 
 
 - Original Message -
 From: Bryan Keller brya...@gmail.com
 To: user@hbase.apache.org
 Cc:
 Sent: Friday, May 10, 2013 8:46 AM
 Subject: Re: Poor HBase map-reduce scan performance
 
 FYI, I ran tests with compression on and off.
 
 With a plain HDFS sequence file and compression off, I am getting very
 good I/O numbers, roughly 75% of theoretical max for reads. With snappy
 compression on with a sequence file, I/O speed is about 3x slower. However
 the file size is 3x smaller so it takes about the same time to scan.
 
 With HBase, the results are equivalent (just much slower than a sequence
 file). Scanning a compressed table is about 3x slower I/O than an
 uncompressed table, but the table is 3x smaller, so the time to scan is
 about the same. Scanning an HBase table takes about 3x as long as scanning
 the sequence file export of the table, either compressed or uncompressed.
 The sequence file export file size ends up being just barely larger than
 the table, either compressed or uncompressed
 
 So in sum, compression slows down I/O 3x, but the file is 3x smaller so
 the time to scan is about the same. Adding in HBase slows things down
 another 3x. So I'm seeing 9x faster I/O scanning an uncompressed sequence
 file vs scanning a compressed table.
 
 
 On May 8, 2013, at 10:15 AM, Bryan Keller brya...@gmail.com wrote:
 
 Thanks for the offer Lars! I haven't made much progress speeding things
 up.
 
 I finally put together a test program that populates a table that is
 similar to my production dataset. I have a readme that should describe
 things, hopefully enough to make it useable. There is code to populate a
 test table, code to scan the table, and code to scan sequence files from an
 export (to compare HBase w/ raw HDFS). I use a gradle build script.
 
 You can find the code here:
 
 https://dl.dropboxusercontent.com/u/6880177/hbasetest.zip
 
 
 On May 4, 2013, at 6:33 PM, lars hofhansl la...@apache.org wrote:
 
 The blockbuffers are not reused, but that by itself should not be a
 problem as they are all the same size (at least I have never identified
 that as one in my profiling sessions).
 
 My offer still stands to do some profiling myself if there is an easy
 way to generate data of similar shape.
 
 -- Lars
 
 
 
 
 From: Bryan Keller brya...@gmail.com
 To: user@hbase.apache.org
 Sent: Friday, May 3, 2013 3:44 AM
 Subject: Re: Poor HBase map-reduce scan performance
 
 
 Actually I'm not too confident in my results re block size, they may
 have been related to major compaction. I'm going to rerun before drawing
 any conclusions.
 
 On May 3, 2013, at 12:17 AM, Bryan Keller brya...@gmail.com wrote:
 
 I finally made some progress. I tried a very large HBase block size
 (16mb), and it significantly improved scan performance. I went from 45-50
 min to 24 min. Not great but much better. Before I had it set to 128k.
 Scanning an equivalent sequence file takes 10 min. My random read
 performance will probably suffer with such a large block size
 (theoretically), so I probably can't keep it this big. I care about random
 read performance too. I've read having a block size this big is not
 recommended, is that correct?
 
 I haven't dug too deeply into the code, are the block buffers reused
 or is each new block read a new allocation? Perhaps a buffer pool could
 help here if there isn't one already. When doing a 

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ramasubramanian Narayanan
Hi,

The read pattern differs from each application..

Is the  below approach fine?

Create one HBASE table with a unique rowkey and put all 200 columns into
it...

create mutiple small HBASE tables where it has the read access pattern
columns and the rowkey it is mapped to the master table...

e.g.
*Master Table :*
MasterRowkey
Field1
..
..
Field 200

*Link Table1:*
Link1Rowkey
Field1
Field13
Field16
Field67
MasterRowkey (value)
*
*
*Link Table2:*
Link2Rowkey
Field5
Field23
Field56
Field167
MasterRowkey (value)


regards,
Rams


On Tue, Jun 4, 2013 at 12:51 PM, Shahab Yunus shahab.yu...@gmail.comwrote:

 Just a quick thought, why don't you create different tables and duplicate
 data i.e. go for demoralization and data redundancy. Is your all read
 access patterns that would require 70 columns are incorporated into one
 application/client? Or it will be bunch of different clients/applications?
 If that is not the case then I think why not take advantage of more
 storage.

 Regards,
 Shahab


 On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan 
 ramasubramanian.naraya...@gmail.com wrote:

  Hi,
 
  In a HBASE table, there are 200 columns and the read pattern for
 diffferent
  systems invols 70 columns...
  In the above case, we cannot have 70 columns in the rowkey which will not
  be a good design...
 
  Can you please suggest how to handle this problem?
  Also can we do indexing in HBASE apart from rowkey? (something called
  secondary index)
 
  regards,
  Rams
 



Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Michel Segel
Quick and dirty...

Create an inverted table for each index
Then you can take the intersection of the result set(s) to get your list of 
rows for further filtering.

There is obviously more to this, but its the core idea...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 4, 2013, at 11:51 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Just a quick thought, why don't you create different tables and duplicate
 data i.e. go for demoralization and data redundancy. Is your all read
 access patterns that would require 70 columns are incorporated into one
 application/client? Or it will be bunch of different clients/applications?
 If that is not the case then I think why not take advantage of more storage.
 
 Regards,
 Shahab
 
 
 On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan 
 ramasubramanian.naraya...@gmail.com wrote:
 
 Hi,
 
 In a HBASE table, there are 200 columns and the read pattern for diffferent
 systems invols 70 columns...
 In the above case, we cannot have 70 columns in the rowkey which will not
 be a good design...
 
 Can you please suggest how to handle this problem?
 Also can we do indexing in HBASE apart from rowkey? (something called
 secondary index)
 
 regards,
 Rams
 


Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ramasubramanian Narayanan
Hi Michel,

If you don't mind can you please help explain in detail ...

Also can you pls let me know whether we have secondary index in HBASE?

regards,
Rams


On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel michael_se...@hotmail.comwrote:

 Quick and dirty...

 Create an inverted table for each index
 Then you can take the intersection of the result set(s) to get your list
 of rows for further filtering.

 There is obviously more to this, but its the core idea...


 Sent from a remote device. Please excuse any typos...

 Mike Segel

 On Jun 4, 2013, at 11:51 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

  Just a quick thought, why don't you create different tables and duplicate
  data i.e. go for demoralization and data redundancy. Is your all read
  access patterns that would require 70 columns are incorporated into one
  application/client? Or it will be bunch of different
 clients/applications?
  If that is not the case then I think why not take advantage of more
 storage.
 
  Regards,
  Shahab
 
 
  On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan 
  ramasubramanian.naraya...@gmail.com wrote:
 
  Hi,
 
  In a HBASE table, there are 200 columns and the read pattern for
 diffferent
  systems invols 70 columns...
  In the above case, we cannot have 70 columns in the rowkey which will
 not
  be a good design...
 
  Can you please suggest how to handle this problem?
  Also can we do indexing in HBASE apart from rowkey? (something called
  secondary index)
 
  regards,
  Rams
 



Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ian Varley
Rams - you might enjoy this blog post from HBase committer Jesse Yates (from 
last summer):

http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html

Secondary Indexing doesn't exist in HBase core today, but there are various 
proposals and early implementations of it in flight.

In the mean time, as Mike and others have said, if you don't need them to be 
immediately consistent in a real-time write scenario, you can simply write the 
same data into multiple tables in different sort orders. (This is hard in a 
real-time write scenario because, without cross-table transactions, you'd have 
to handle all the cases where the record was written but the index wasn't, or 
vice versa.)

Ian

On Jun 4, 2013, at 12:22 PM, Ramasubramanian Narayanan wrote:

Hi Michel,

If you don't mind can you please help explain in detail ...

Also can you pls let me know whether we have secondary index in HBASE?

regards,
Rams


On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel 
michael_se...@hotmail.commailto:michael_se...@hotmail.comwrote:

Quick and dirty...

Create an inverted table for each index
Then you can take the intersection of the result set(s) to get your list
of rows for further filtering.

There is obviously more to this, but its the core idea...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 4, 2013, at 11:51 AM, Shahab Yunus 
shahab.yu...@gmail.commailto:shahab.yu...@gmail.com wrote:

Just a quick thought, why don't you create different tables and duplicate
data i.e. go for demoralization and data redundancy. Is your all read
access patterns that would require 70 columns are incorporated into one
application/client? Or it will be bunch of different
clients/applications?
If that is not the case then I think why not take advantage of more
storage.

Regards,
Shahab


On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan 
ramasubramanian.naraya...@gmail.commailto:ramasubramanian.naraya...@gmail.com
 wrote:

Hi,

In a HBASE table, there are 200 columns and the read pattern for
diffferent
systems invols 70 columns...
In the above case, we cannot have 70 columns in the rowkey which will
not
be a good design...

Can you please suggest how to handle this problem?
Also can we do indexing in HBASE apart from rowkey? (something called
secondary index)

regards,
Rams





Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Michael Segel
Ok...

A little bit more detail...

First, its possible to store your data in multiple tables each with a different 
key. 
Not a good idea for some very obvious reasons

You could however create a secondary table which is an inverted table where the 
rowkey of the index is the value in the base table and the column name is the 
rowkey in the base table and the value is the base table. 

This will work well, as long as you're not indexing a column that has a small 
finite set of values like a binary index. (Male/Female as an example...) 
(It will create a very wide row...) 

But in a general case it should work ok.  Note too that you can also still 
create a compound key for the index. 

As an example... you could create an index on manufacture, model, year, color  
where the value is the VIN which would be the rowkey for the base table.

Then if you want to find all of the 2005 Volvo S80's on the road, you can do a 
partial scan of the index setting up start and stop rows.
Then filter the result set based on the state listed on the vehicle's 
registration. 

The idea is that you would fetch the rows from the index query's result set and 
that would be your list that you would use for your next query. 

Again, there is more to this... like if you have multiple indexes on the data, 
you'd take the intersection of the result set(s) and then apply the filters 
that are not indexed.  

The initial key lookups should normally be a simple fetch of a single row, 
yielding you a list of rows in the base table. 

PLEASE NOTE THE FOLLOWING:

1) This is a general use case example. 
2) YMMV based on the use case
3) YMMV based on the data contained in your underlying table
4) This is one simple way that can work with or without coprocessors 
5) There is more to the solution, I'm painting a very high level solution.

And of course I'm waiting for someone to mention that you look at Phoenix which 
can implement this or a variation on this to do indexing. 

And of course you have other indexing options. 

HTH...

-Mike

On Jun 4, 2013, at 12:30 PM, Ian Varley ivar...@salesforce.com wrote:

 Rams - you might enjoy this blog post from HBase committer Jesse Yates (from 
 last summer):
 
 http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html
 
 Secondary Indexing doesn't exist in HBase core today, but there are various 
 proposals and early implementations of it in flight.
 
 In the mean time, as Mike and others have said, if you don't need them to be 
 immediately consistent in a real-time write scenario, you can simply write 
 the same data into multiple tables in different sort orders. (This is hard in 
 a real-time write scenario because, without cross-table transactions, you'd 
 have to handle all the cases where the record was written but the index 
 wasn't, or vice versa.)
 
 Ian
 
 On Jun 4, 2013, at 12:22 PM, Ramasubramanian Narayanan wrote:
 
 Hi Michel,
 
 If you don't mind can you please help explain in detail ...
 
 Also can you pls let me know whether we have secondary index in HBASE?
 
 regards,
 Rams
 
 
 On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel 
 michael_se...@hotmail.commailto:michael_se...@hotmail.comwrote:
 
 Quick and dirty...
 
 Create an inverted table for each index
 Then you can take the intersection of the result set(s) to get your list
 of rows for further filtering.
 
 There is obviously more to this, but its the core idea...
 
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On Jun 4, 2013, at 11:51 AM, Shahab Yunus 
 shahab.yu...@gmail.commailto:shahab.yu...@gmail.com wrote:
 
 Just a quick thought, why don't you create different tables and duplicate
 data i.e. go for demoralization and data redundancy. Is your all read
 access patterns that would require 70 columns are incorporated into one
 application/client? Or it will be bunch of different
 clients/applications?
 If that is not the case then I think why not take advantage of more
 storage.
 
 Regards,
 Shahab
 
 
 On Tue, Jun 4, 2013 at 12:43 PM, Ramasubramanian Narayanan 
 ramasubramanian.naraya...@gmail.commailto:ramasubramanian.naraya...@gmail.com
  wrote:
 
 Hi,
 
 In a HBASE table, there are 200 columns and the read pattern for
 diffferent
 systems invols 70 columns...
 In the above case, we cannot have 70 columns in the rowkey which will
 not
 be a good design...
 
 Can you please suggest how to handle this problem?
 Also can we do indexing in HBASE apart from rowkey? (something called
 secondary index)
 
 regards,
 Rams
 
 
 



Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Hi,

We are relatively new to Hbase, and we are hitting a roadblock on our scan 
performance. I searched through the email archives and applied a bunch of the 
recommendations there, but they did not improve much. So, I am hoping I am 
missing something which you could guide me towards. Thanks in advance.

We are currently writing data and reading in an almost continuous mode (stream 
of data written into an HBase table and then we run a time-based MR on top of 
this Table). We currently were backed up and about 1.5 TB of data was loaded 
into the table and we began performing time-based scan MRs in 10 minute time 
intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute 
interval had about 100 GB of data to process. 

Our workflow was to primarily eliminate duplicates from this table. We have  
maxVersions = 5 for the table. We use TableInputFormat to perform the 
time-based scan to ensure data locality. In the mapper, we check if there 
exists a previous version of the row in a time period earlier to the timestamp 
of the input row. If not, we emit that row. 

We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned 
off block cache for this table with the expectation that the block index and 
bloom filter will be cached in the block cache. We expect duplicates to be rare 
and hence hope for most of these checks to be fulfilled by the bloom filter. 
Unfortunately, we notice very slow performance on account of being disk bound. 
Looking at jstack, we notice that most of the time, we appear to be hitting 
disk for the block index. We performed a major compaction and retried and 
performance improved some, but not by much. We are processing data at about 2 
MB per second.

  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 
datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). HBase 
is running with 30 GB Heap size, memstore values being capped at 3 GB and flush 
thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 GB). 
We are using SNAPPY for our tables.


A couple of questions:
* Is the performance of the time-based scan bad after a major 
compaction?

* What can we do to help alleviate being disk bound? The typical answer 
of adding more RAM does not seem to have helped, or we are missing some other 
config



Below are some of the metrics from a Regionserver webUI:

requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, 
numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, 
totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, 
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, 
readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, 
flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, 
blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=2759, 
blockCacheMissCount=25373411, blockCacheEvictedCount=7112, 
blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, 
hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, 
fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, 
fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, 
fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, 
fsReadLatencyHistogram999th=511591146.03,
 fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=42, 
fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, 
fsPreadLatencyHistogram95th=11159637.65, 
fsPreadLatencyHistogram99th=37763281.57, 
fsPreadLatencyHistogram999th=273192813.91, 
fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=114, 
fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, 
fsWriteLatencyHistogram95th=576853.8, fsWriteLatencyHistogram99th=1034159.75, 
fsWriteLatencyHistogram999th=5687910.29



key size: 20 bytes 

Table description:
{NAME = 'foo', FAMILIES = [{NAME = 'f', DATA_BLOCK_ENCODING = 'NONE', 
BLOOMFI true
 LTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'SNAPPY', VERSIONS = 
'5', TTL = '
 2592000', MIN_VERSIONS = '0', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = 
'65536', ENCODE_
 ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'false'}]}

Re: Explosion in datasize using HBase as a MR sink

2013-06-04 Thread Rob Verkuylen
Finally fixed this, my code was at fault.

Protobufs require a builder object which was a (non static) protected object in 
an abstract class all parsers extend. The mapper calls a parser factory 
depending on the input record. Because we designed the parser instances as 
singletons, the builder object in the abstract class got reused and all data 
got appended to the same builder. Doh! This only shows up in a job, not in 
single tests. Ah well, I've learned a lot  :)

@Asaf we will be moving to LoadIncrementalHFiles asap. I had the code ready, 
but obviously it showed the same size problems before the fix.

Thnx for the thoughts!

On May 31, 2013, at 22:02, Asaf Mesika asaf.mes...@gmail.com wrote:

 On your data set size, I would go on HFile OutputFormat and then bulk load in 
 into HBase. Why go through the Put flow anyway (memstore, flush, WAL), 
 especially if you have the input ready at your disposal for re-try if 
 something fails?
 Sounds faster to me anyway.
 
 On May 30, 2013, at 10:52 PM, Rob Verkuylen r...@verkuylen.net wrote:
 
 
 On May 30, 2013, at 4:51, Stack st...@duboce.net wrote:
 
 Triggering a major compaction does not alter the overall 217.5GB size?
 
 A major compaction reduces the size from the original 219GB to the 217,5GB, 
 so barely a reduction. 
 80% of the region sizes are 1,4GB before and after. I haven't merged the 
 smaller regions,
 but that still would not bring the size down to the 2,5-5 or so GB I would 
 expect given T2's size.
 
 You have speculative execution turned on in your MR job so its possible you
 write many versions?
 
 I've turned off speculative execution (through conf.set) just for the 
 mappers, since we're not using reducers, should we? 
 I will triple check the actual job settings in the job tracker, since I need 
 to make the settings on a job level.
 
 Does your MR job fail many tasks (and though it fails, until it fails, it
 will have written some subset of the task hence bloating your versions?).
 
 We've had problems with failing mappers, because of zookeeper timeouts on 
 large inserts,
 we increased zookeeper timeout and blockingstorefiles to accommodate. Now we 
 don't
 get failures. This job writes to a cleanly made table, versions set to 1, so 
 there shouldn't be
 extra versions I assume(?).
 
 You are putting everything into protobufs?  Could that be bloating your
 data?  Can you take a smaller subset and dump to the log a string version
 of the pb.  Use TextFormat
 https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/TextFormat#shortDebugString(com.google.protobuf.MessageOrBuilder)
 
 The protobufs reduce the size to roughly 40% of the original XML data in T1. 
 The MR parser is a port of the python parse code we use going from T1 to T2.
 I've done manual comparisons on 20-30 records from T2.1 and T2 and they are 
 identical, 
 with only minute differences, because of slightly different parsing. I've 
 done these in hbase shell,
 I will try log dumping them too.
 
 It can be informative looking at hfile content.  It could give you a clue
 as to the bloat.  See http://hbase.apache.org/book.html#hfile_tool
 
 I will give this a go and report back. Any other debugging suggestions are 
 more then welcome :)
 
 Thnx, Rob
 
 



Replication is on columnfamily level or table level?

2013-06-04 Thread N Dm
hi, folks,

hbase 0.94.3

By reading several documents, I always have the impression that *
Replication* works at the table-*column*-*family level*. However, when I
am setting up a table with two columnfamilies and replicate them to two
different slavers, the whole table replicated. Is this a bug? Thanks

Here is the simple steps to receate.

*Environment: *
Replication Master: hdtest014
Replication Slave 1: hdtest017
Replication Slave 2: hdtest009

*Create Table*: on Master, and the two slaves:  create 't2_dn','cf1','cf2'

*setup replication on Master*(hdtest014), so that
Master list_peers
 PEER_ID CLUSTER_KEY STATE
 1 hdtest017.svl.ibm.com:2181:/hbase ENABLED
 2 hdtest009.svl.ibm.com:2181:/hbase ENABLED
Master  describe 't2_dn'
DESCRIPTION
ENABLED
 {NAME = 't2_dn', FAMILIES = [{*NAME = 'cf1', REPLICATION_SCOPE = '1'*,
KEEP_DELETED_CELLS = 'fals
true
 e', COMPRESSION = 'NONE', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true',
MIN_VERSIONS = '0',
DATA
 _BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'NONE',
TTL = '2147483647',
VERSION
 S = '3', BLOCKSIZE = '65536'}, {*NAME = 'cf2', REPLICATION_SCOPE = '2'*,
KEEP_DELETED_CELLS =
'fa
 lse', COMPRESSION = 'NONE', ENCODE_ON_DISK = 'true', BLOCKCACHE =
'true', MIN_VERSIONS = '0',
DA
 TA_BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'NONE',
TTL = '2147483647',
VERSI
 ONS = '3', BLOCKSIZE =
'65536'}]}

1 row(s) in 0.0250 seconds

*Put rows into t2_dn on Master*
put 't2_dn','row1','cf1:q1','val1cf1fromMaster'
put 't2_dn','row1','cf2:q1','val1cf2fromMaster'
put 't2_dn','row2','cf1:q1','val2cf1fromMaster'
put 't2_dn','row3','cf2:q1','val3cf2fromMaster'

*Expecting cf1 replicated to slave1, and cf2 replicatedto slave2. Where all
the three clusters got: *
scan 't2_dn'
ROW
COLUMN+CELL

 row1  column=cf1:q1, timestamp=1370382328358,
value=val1cf1fromMaster
 row1  column=cf2:q1, timestamp=1370382334303,
value=val1cf2fromMaster
 row2  column=cf1:q1, timestamp=1370382351716,
value=val2cf1fromMaster
 row3  column=cf2:q1, timestamp=1370382367724,
value=val3cf2fromMaster
3 row(s) in 0.0160 seconds

Many thanks

Demai


Re: Explosion in datasize using HBase as a MR sink

2013-06-04 Thread Stack
On Tue, Jun 4, 2013 at 9:58 PM, Rob Verkuylen r...@verkuylen.net wrote:

 Finally fixed this, my code was at fault.

 Protobufs require a builder object which was a (non static) protected
 object in an abstract class all parsers extend. The mapper calls a parser
 factory depending on the input record. Because we designed the parser
 instances as singletons, the builder object in the abstract class got
 reused and all data got appended to the same builder. Doh! This only shows
 up in a job, not in single tests. Ah well, I've learned a lot  :)


Thanks for updating the list Rob.

Yours is a classic except it is first time I've heard of someone
protobufing it..  Usually it is a reuse of an Hadoop Writable instance
accumulating

St.Ack


Re: RPC Replication Compression

2013-06-04 Thread Stack
On Tue, Jun 4, 2013 at 6:48 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Replication doesn't need to know about compression at the RPC level so
 it won't refer to it and as far as I can tell you need to set
 compression only on the master cluster and the slave will figure it
 out.

 Looking at the code tho, I'm not sure it works the same way it used to
 work before everything went protobuf. I would give 2 internets to
 whoever tests 0.95.1 with RPC compression turned on and compares
 results with non-compressed RPC. See
 http://hbase.apache.org/book.html#rpc.configs


What are you looking for JD?  Faster replication or just less network used?

Looks like we have not had the ability to do compressed rpc before (We
almost did,  the original rpc compression attempt almost got committed to
trunk -- see HBASE-5355 and the referenced follow-on issue -- but was put
aside after the pb stuff went in).

St.Ack


Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Sandy Pratt
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
with an update in the meantime.

I tried a number of different approaches to eliminate latency and
bubbles in the scan pipeline, and eventually arrived at adding a
streaming scan API to the region server, along with refactoring the scan
interface into an event-drive message receiver interface.  In so doing, I
was able to take scan speed on my cluster from 59,537 records/sec with the
classic scanner to 222,703 records per second with my new scan API.
Needless to say, I'm pleased ;)

More details forthcoming when I get a chance.

Thanks,
Sandy

On 5/23/13 3:47 PM, Ted Yu yuzhih...@gmail.com wrote:

Thanks for the update, Sandy.

If you can open a JIRA and attach your producer / consumer scanner there,
that would be great.

On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote:

 I wrote myself a Scanner wrapper that uses a producer/consumer queue to
 keep the client fed with a full buffer as much as possible.  When
scanning
 my table with scanner caching at 100 records, I see about a 24% uplift
in
 performance (~35k records/sec with the ClientScanner and ~44k
records/sec
 with my P/C scanner).  However, when I set scanner caching to 5000, it's
 more of a wash compared to the standard ClientScanner: ~53k records/sec
 with the ClientScanner and ~60k records/sec with the P/C scanner.

 I'm not sure what to make of those results.  I think next I'll shut down
 HBase and read the HFiles directly, to see if there's a drop off in
 performance between reading them directly vs. via the RegionServer.

 I still think that to really solve this there needs to be sliding window
 of records in flight between disk and RS, and between RS and client.
I'm
 thinking there's probably a single batch of records in flight between RS
 and client at the moment.

 Sandy

 On 5/23/13 8:45 AM, Bryan Keller brya...@gmail.com wrote:

 I am considering scanning a snapshot instead of the table. I believe
this
 is what the ExportSnapshot class does. If I could use the scanning code
 from ExportSnapshot then I will be able to scan the HDFS files directly
 and bypass the regionservers. This could potentially give me a huge
boost
 in performance for full table scans. However, it doesn't really address
 the poor scan performance against a table.





Questions about HBase

2013-06-04 Thread Pankaj Gupta
Hi,

I have a few small questions regarding HBase. I've searched the forum but
couldn't find clear answers hence asking them here:


   1. Does Minor compaction remove HFiles in which all entries are out of
   TTL or does only Major compaction do that? I found this jira:
   https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if the
   compaction being talked about there is minor or major.
   2. Is there a way of configuring major compaction to compact only files
   older than a certain time or to compress all the files except the latest
   few? We basically want to use the time based filtering optimization in
   HBase to get the latest additions to the table and since major compaction
   bunches everything into one file, it would defeat the optimization.
   3. Is there a way to warm up the bloom filter and block index cache for
   a table? This is for a case where I always want the bloom filters and index
   to be all in memory, but not the data blocks themselves.
   4. This one is related to what I read in the HBase definitive guide
   bloom filter section
   Given a random row key you are looking for, it is very likely that this
   key will fall in between two block start keys. The only way for HBase to
   figure out if the key actually exists is by loading the block and scanning
   it to find the key.
   The above excerpt seems to imply to me that the search for key inside a
   block is linear and I feel I must be reading it wrong. I would expect the
   scan to be a binary search.


Thanks in Advance,
Pankaj

-- 


*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pan...@brightroll.com

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com


United States | Canada | United Kingdom | Germany


We're 
hiringhttp://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
!


Re: Questions about HBase

2013-06-04 Thread ramkrishna vasudevan
Does Minor compaction remove HFiles in which all entries are out of
   TTL or does only Major compaction do that
Yes it applies for Minor compactions.
Is there a way of configuring major compaction to compact only files
   older than a certain time or to compress all the files except the latest
   few?
In the latest trunk version the compaction algo itself can be plugged.
 There are some coprocessor hooks that gives control on the scanner that
gets created for compaction with which we can control the KVs being
selected. But i am not very sure if we can control the files getting
selected for compaction in the older verisons.
 The above excerpt seems to imply to me that the search for key inside a block
is linear and I feel I must be reading it wrong. I would expect the scan to
be a binary search.
Once the data block is identified for a key, we seek to the beginning of
the block and then do a linear search until we reach the exact key that we
are looking out for.  Because internally the data (KVs) are stored as byte
buffers per block and it follows this pattern
keylengthvaluelengthkeybytearrayvaluebytearray
Is there a way to warm up the bloom filter and block index cache for
   a table?
You always want the bloom and block index to be in cache?


On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta pan...@brightroll.com wrote:

 Hi,

 I have a few small questions regarding HBase. I've searched the forum but
 couldn't find clear answers hence asking them here:


1. Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that? I found this jira:
https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if
 the
compaction being talked about there is minor or major.
2. Is there a way of configuring major compaction to compact only files
older than a certain time or to compress all the files except the latest
few? We basically want to use the time based filtering optimization in
HBase to get the latest additions to the table and since major
 compaction
bunches everything into one file, it would defeat the optimization.
3. Is there a way to warm up the bloom filter and block index cache for
a table? This is for a case where I always want the bloom filters and
 index
to be all in memory, but not the data blocks themselves.
4. This one is related to what I read in the HBase definitive guide
bloom filter section
Given a random row key you are looking for, it is very likely that this
key will fall in between two block start keys. The only way for HBase to
figure out if the key actually exists is by loading the block and
 scanning
it to find the key.
The above excerpt seems to imply to me that the search for key inside a
block is linear and I feel I must be reading it wrong. I would expect
 the
scan to be a binary search.


 Thanks in Advance,
 Pankaj

 --


 *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pan...@brightroll.com

 Pankaj Gupta | Software Engineer

 *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com


 United States | Canada | United Kingdom | Germany


 We're hiring
 http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
 
 !



Re: Questions about HBase

2013-06-04 Thread Ted Yu
bq.  I found this jira:  https://issues.apache.org/jira/browse/HBASE-5199 but
I dont' know if the
   compaction being talked about there is minor or major.

The optimization above applies to minor compaction selection.

Cheers

On Tue, Jun 4, 2013 at 7:15 PM, Pankaj Gupta pan...@brightroll.com wrote:

 Hi,

 I have a few small questions regarding HBase. I've searched the forum but
 couldn't find clear answers hence asking them here:


1. Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that? I found this jira:
https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if
 the
compaction being talked about there is minor or major.
2. Is there a way of configuring major compaction to compact only files
older than a certain time or to compress all the files except the latest
few? We basically want to use the time based filtering optimization in
HBase to get the latest additions to the table and since major
 compaction
bunches everything into one file, it would defeat the optimization.
3. Is there a way to warm up the bloom filter and block index cache for
a table? This is for a case where I always want the bloom filters and
 index
to be all in memory, but not the data blocks themselves.
4. This one is related to what I read in the HBase definitive guide
bloom filter section
Given a random row key you are looking for, it is very likely that this
key will fall in between two block start keys. The only way for HBase to
figure out if the key actually exists is by loading the block and
 scanning
it to find the key.
The above excerpt seems to imply to me that the search for key inside a
block is linear and I feel I must be reading it wrong. I would expect
 the
scan to be a binary search.


 Thanks in Advance,
 Pankaj

 --


 *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pan...@brightroll.com

 Pankaj Gupta | Software Engineer

 *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com


 United States | Canada | United Kingdom | Germany


 We're hiring
 http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
 
 !



Re: Questions about HBase

2013-06-04 Thread Ted Yu
bq. But i am not very sure if we can control the files getting selected for
compaction in the older verisons.

Same mechanism is available in 0.94

Take a look
at src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
where you would find the following methods (and more):

  public void preCompactSelection(final
ObserverContextRegionCoprocessorEnvironment c,
  final Store store, final ListStoreFile candidates, final
CompactionRequest request)
  public InternalScanner
preCompact(ObserverContextRegionCoprocessorEnvironment e,
  final Store store, final InternalScanner scanner) throws IOException {

Cheers

On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that
 Yes it applies for Minor compactions.
 Is there a way of configuring major compaction to compact only files
older than a certain time or to compress all the files except the latest
few?
 In the latest trunk version the compaction algo itself can be plugged.
  There are some coprocessor hooks that gives control on the scanner that
 gets created for compaction with which we can control the KVs being
 selected. But i am not very sure if we can control the files getting
 selected for compaction in the older verisons.
  The above excerpt seems to imply to me that the search for key inside a
 block
 is linear and I feel I must be reading it wrong. I would expect the scan to
 be a binary search.
 Once the data block is identified for a key, we seek to the beginning of
 the block and then do a linear search until we reach the exact key that we
 are looking out for.  Because internally the data (KVs) are stored as byte
 buffers per block and it follows this pattern
 keylengthvaluelengthkeybytearrayvaluebytearray
 Is there a way to warm up the bloom filter and block index cache for
a table?
 You always want the bloom and block index to be in cache?


 On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta pan...@brightroll.com
 wrote:

  Hi,
 
  I have a few small questions regarding HBase. I've searched the forum but
  couldn't find clear answers hence asking them here:
 
 
 1. Does Minor compaction remove HFiles in which all entries are out of
 TTL or does only Major compaction do that? I found this jira:
 https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if
  the
 compaction being talked about there is minor or major.
 2. Is there a way of configuring major compaction to compact only
 files
 older than a certain time or to compress all the files except the
 latest
 few? We basically want to use the time based filtering optimization in
 HBase to get the latest additions to the table and since major
  compaction
 bunches everything into one file, it would defeat the optimization.
 3. Is there a way to warm up the bloom filter and block index cache
 for
 a table? This is for a case where I always want the bloom filters and
  index
 to be all in memory, but not the data blocks themselves.
 4. This one is related to what I read in the HBase definitive guide
 bloom filter section
 Given a random row key you are looking for, it is very likely that
 this
 key will fall in between two block start keys. The only way for HBase
 to
 figure out if the key actually exists is by loading the block and
  scanning
 it to find the key.
 The above excerpt seems to imply to me that the search for key inside
 a
 block is linear and I feel I must be reading it wrong. I would expect
  the
 scan to be a binary search.
 
 
  Thanks in Advance,
  Pankaj
 
  --
 
 
  *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
 pan...@brightroll.com
 
  Pankaj Gupta | Software Engineer
 
  *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
 
 
  United States | Canada | United Kingdom | Germany
 
 
  We're hiring
 
 http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
  
  !
 



Re: Scan + Gets are disk bound

2013-06-04 Thread anil gupta
On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com wrote:

 Hi,

 We are relatively new to Hbase, and we are hitting a roadblock on our scan
 performance. I searched through the email archives and applied a bunch of
 the recommendations there, but they did not improve much. So, I am hoping I
 am missing something which you could guide me towards. Thanks in advance.

 We are currently writing data and reading in an almost continuous mode
 (stream of data written into an HBase table and then we run a time-based MR
 on top of this Table). We currently were backed up and about 1.5 TB of data
 was loaded into the table and we began performing time-based scan MRs in 10
 minute time intervals(startTime and endTime interval is 10 minutes). Most
 of the 10 minute interval had about 100 GB of data to process.

 Our workflow was to primarily eliminate duplicates from this table. We
 have  maxVersions = 5 for the table. We use TableInputFormat to perform the
 time-based scan to ensure data locality. In the mapper, we check if there
 exists a previous version of the row in a time period earlier to the
 timestamp of the input row. If not, we emit that row.

 We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
 turned off block cache for this table with the expectation that the block
 index and bloom filter will be cached in the block cache. We expect
 duplicates to be rare and hence hope for most of these checks to be
 fulfilled by the bloom filter. Unfortunately, we notice very slow
 performance on account of being disk bound. Looking at jstack, we notice
 that most of the time, we appear to be hitting disk for the block index. We
 performed a major compaction and retried and performance improved some, but
 not by much. We are processing data at about 2 MB per second.

   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).

Anil: You dont have the right balance between disk,cpu and ram. You have
too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
to be the biggest reason of your problem.

 HBase is running with 30 GB Heap size, memstore values being capped at 3
 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
 heap size(15 GB). We are using SNAPPY for our tables.


 A couple of questions:
 * Is the performance of the time-based scan bad after a major
 compaction?

Anil: In general, TimeBased(i am assuming you have built your rowkey on
timestamp) scans are not good for HBase because of region hot-spotting.
Have you tried setting the ScannerCaching to a higher number?


 * What can we do to help alleviate being disk bound? The typical
 answer of adding more RAM does not seem to have helped, or we are missing
 some other config

Anil: Try adding more disks to your machines.




 Below are some of the metrics from a Regionserver webUI:

 requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
 numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
 totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
 memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
 readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
 flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
 blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
 blockCacheHitCount=2759, blockCacheMissCount=25373411,
 blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
 blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
 slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
 fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
 fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
 fsReadLatencyHistogram99th=100981301.2,
 fsReadLatencyHistogram999th=511591146.03,
  fsPreadLatencyHistogramMean=3895616.6,
 fsPreadLatencyHistogramCount=42, fsPreadLatencyHistogramMedian=954552,
 fsPreadLatencyHistogram75th=8723662.5,
 fsPreadLatencyHistogram95th=11159637.65,
 fsPreadLatencyHistogram99th=37763281.57,
 fsPreadLatencyHistogram999th=273192813.91,
 fsWriteLatencyHistogramMean=6124343.91,
 fsWriteLatencyHistogramCount=114, fsWriteLatencyHistogramMedian=374379,
 fsWriteLatencyHistogram75th=431395.75,
 fsWriteLatencyHistogram95th=576853.8,
 fsWriteLatencyHistogram99th=1034159.75,
 fsWriteLatencyHistogram999th=5687910.29



 key size: 20 bytes

 Table description:
 {NAME = 'foo', FAMILIES = [{NAME = 'f', DATA_BLOCK_ENCODING = 'NONE',
 BLOOMFI true
  LTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'SNAPPY',
 VERSIONS = '5', TTL = '
  2592000', MIN_VERSIONS = '0', KEEP_DELETED_CELLS = 'false', BLOCKSIZE
 = '65536', ENCODE_
  ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'false'}]}




-- 
Thanks  Regards,
Anil Gupta


Re: Replication is on columnfamily level or table level?

2013-06-04 Thread Anoop John
Yes the replication can be specified at the CF level..  You have used
HCD#setScope() right?

 S = '3', BLOCKSIZE = '65536'}, {*NAME = 'cf2', REPLICATION_SCOPE =
'2'*,
You set scope as 2?? You have to set one CF to be replicated to one cluster
and another to to another cluster. I dont think it is supported even now.
You can see in the HCD code that there are 2 constants for scope 0 and 1
where 1 means replicate and 0 means not to be  replicated.

-Anoop-

On Wed, Jun 5, 2013 at 3:31 AM, N Dm nid...@gmail.com wrote:

 hi, folks,

 hbase 0.94.3

 By reading several documents, I always have the impression that *
 Replication* works at the table-*column*-*family level*. However, when I
 am setting up a table with two columnfamilies and replicate them to two
 different slavers, the whole table replicated. Is this a bug? Thanks

 Here is the simple steps to receate.

 *Environment: *
 Replication Master: hdtest014
 Replication Slave 1: hdtest017
 Replication Slave 2: hdtest009

 *Create Table*: on Master, and the two slaves:  create 't2_dn','cf1','cf2'

 *setup replication on Master*(hdtest014), so that
 Master list_peers
  PEER_ID CLUSTER_KEY STATE
  1 hdtest017.svl.ibm.com:2181:/hbase ENABLED
  2 hdtest009.svl.ibm.com:2181:/hbase ENABLED
 Master  describe 't2_dn'
 DESCRIPTION
 ENABLED
  {NAME = 't2_dn', FAMILIES = [{*NAME = 'cf1', REPLICATION_SCOPE = '1'*,
 KEEP_DELETED_CELLS = 'fals
 true
  e', COMPRESSION = 'NONE', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true',
 MIN_VERSIONS = '0',
 DATA
  _BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'NONE',
 TTL = '2147483647',
 VERSION
  S = '3', BLOCKSIZE = '65536'}, {*NAME = 'cf2', REPLICATION_SCOPE =
 '2'*,
 KEEP_DELETED_CELLS =
 'fa
  lse', COMPRESSION = 'NONE', ENCODE_ON_DISK = 'true', BLOCKCACHE =
 'true', MIN_VERSIONS = '0',
 DA
  TA_BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'NONE',
 TTL = '2147483647',
 VERSI
  ONS = '3', BLOCKSIZE =
 '65536'}]}

 1 row(s) in 0.0250 seconds

 *Put rows into t2_dn on Master*
 put 't2_dn','row1','cf1:q1','val1cf1fromMaster'
 put 't2_dn','row1','cf2:q1','val1cf2fromMaster'
 put 't2_dn','row2','cf1:q1','val2cf1fromMaster'
 put 't2_dn','row3','cf2:q1','val3cf2fromMaster'

 *Expecting cf1 replicated to slave1, and cf2 replicatedto slave2. Where all
 the three clusters got: *
 scan 't2_dn'
 ROW
 COLUMN+CELL

  row1  column=cf1:q1, timestamp=1370382328358,
 value=val1cf1fromMaster
  row1  column=cf2:q1, timestamp=1370382334303,
 value=val1cf2fromMaster
  row2  column=cf1:q1, timestamp=1370382351716,
 value=val2cf1fromMaster
  row3  column=cf2:q1, timestamp=1370382367724,
 value=val3cf2fromMaster
 3 row(s) in 0.0160 seconds

 Many thanks

 Demai



Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Our row-keys do not contain time. By time-based scans, I mean, an MR over the 
Hbase table where the scan object has no startRow or endRow but has a startTime 
and endTime.

Our row key format is MD5 of UUID+UUID, so, we expect good distribution. We 
have pre-split initially to prevent any initial hotspotting.
~Rahul.



 From: anil gupta anilgupt...@gmail.com
To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, June 4, 2013 9:31 PM
Subject: Re: Scan + Gets are disk bound
 







On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com wrote:

Hi,

We are relatively new to Hbase, and we are hitting a roadblock on our scan 
performance. I searched through the email archives and applied a bunch of the 
recommendations there, but they did not improve much. So, I am hoping I am 
missing something which you could guide me towards. Thanks in advance.

We are currently writing data and reading in an almost continuous mode (stream 
of data written into an HBase table and then we run a time-based MR on top of 
this Table). We currently were backed up and about 1.5 TB of data was loaded 
into the table and we began performing time-based scan MRs in 10 minute time 
intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute 
interval had about 100 GB of data to process. 

Our workflow was to primarily eliminate duplicates from this table. We have  
maxVersions = 5 for the table. We use TableInputFormat to perform the 
time-based scan to ensure data locality. In the mapper, we check if there 
exists a previous version of the row in a time period earlier to the timestamp 
of the input row. If not, we emit that row. 

We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned 
off block cache for this table with the expectation that the block index and 
bloom filter will be cached in the block cache. We expect duplicates to be 
rare and hence hope for most of these checks to be fulfilled by the bloom 
filter. Unfortunately, we notice very slow performance on account of being 
disk bound. Looking at jstack, we notice that most of the time, we appear to 
be hitting disk for the block index. We performed a major compaction and 
retried and performance improved some, but not by much. We are processing data 
at about 2 MB per second.

  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 
datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). 
Anil: You dont have the right balance between disk,cpu and ram. You have too 
much of CPU, RAM but very less NUMBER of disks. Usually, its better to have a 
Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems to be the 
biggest reason of your problem.

HBase is running with 30 GB Heap size, memstore values being capped at 3 GB and 
flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 
GB). We are using SNAPPY for our tables.


A couple of questions:
        * Is the performance of the time-based scan bad after a major 
compaction?

Anil: In general, TimeBased(i am assuming you have built your rowkey on 
timestamp) scans are not good for HBase because of region hot-spotting. Have 
you tried setting the ScannerCaching to a higher number?


        * What can we do to help alleviate being disk bound? The typical 
answer of adding more RAM does not seem to have helped, or we are missing some 
other config

Anil: Try adding more disks to your machines. 




Below are some of the metrics from a Regionserver webUI:

requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, 
numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, 
totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, 
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, 
readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, 
flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, 
blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=2759, 
blockCacheMissCount=25373411, blockCacheEvictedCount=7112, 
blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, 
hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, 
fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, 
fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, 
fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, 
fsReadLatencyHistogram999th=511591146.03,
 fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=42, 
fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, 
fsPreadLatencyHistogram95th=11159637.65, 
fsPreadLatencyHistogram99th=37763281.57, 
fsPreadLatencyHistogram999th=273192813.91, 
fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=114, 
fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, 

Re: Questions about HBase

2013-06-04 Thread Pankaj Gupta
Thanks for the replies. I'll take a look at src/main/java/org/apache/
hadoop/hbase/coprocessor/BaseRegionObserver.java.

@ramkrishna: I do want to have bloom filter and block index all the time.
For good read performance they're critical in my workflow. The worry is
that when HBase is restarted it will take a long time for them to get
populated again and performance will suffer. If there was a way of loading
them quickly and warm up the table then we'll be able to restart HBase
without causing slow down in processing.


On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu yuzhih...@gmail.com wrote:

 bq. But i am not very sure if we can control the files getting selected for
 compaction in the older verisons.

 Same mechanism is available in 0.94

 Take a look
 at
 src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
 where you would find the following methods (and more):

   public void preCompactSelection(final
 ObserverContextRegionCoprocessorEnvironment c,
   final Store store, final ListStoreFile candidates, final
 CompactionRequest request)
   public InternalScanner
 preCompact(ObserverContextRegionCoprocessorEnvironment e,
   final Store store, final InternalScanner scanner) throws IOException
 {

 Cheers

 On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:

  Does Minor compaction remove HFiles in which all entries are out of
 TTL or does only Major compaction do that
  Yes it applies for Minor compactions.
  Is there a way of configuring major compaction to compact only files
 older than a certain time or to compress all the files except the
 latest
 few?
  In the latest trunk version the compaction algo itself can be plugged.
   There are some coprocessor hooks that gives control on the scanner that
  gets created for compaction with which we can control the KVs being
  selected. But i am not very sure if we can control the files getting
  selected for compaction in the older verisons.
   The above excerpt seems to imply to me that the search for key inside
 a
  block
  is linear and I feel I must be reading it wrong. I would expect the scan
 to
  be a binary search.
  Once the data block is identified for a key, we seek to the beginning of
  the block and then do a linear search until we reach the exact key that
 we
  are looking out for.  Because internally the data (KVs) are stored as
 byte
  buffers per block and it follows this pattern
  keylengthvaluelengthkeybytearrayvaluebytearray
  Is there a way to warm up the bloom filter and block index cache for
 a table?
  You always want the bloom and block index to be in cache?
 
 
  On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta pan...@brightroll.com
  wrote:
 
   Hi,
  
   I have a few small questions regarding HBase. I've searched the forum
 but
   couldn't find clear answers hence asking them here:
  
  
  1. Does Minor compaction remove HFiles in which all entries are out
 of
  TTL or does only Major compaction do that? I found this jira:
  https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know
 if
   the
  compaction being talked about there is minor or major.
  2. Is there a way of configuring major compaction to compact only
  files
  older than a certain time or to compress all the files except the
  latest
  few? We basically want to use the time based filtering optimization
 in
  HBase to get the latest additions to the table and since major
   compaction
  bunches everything into one file, it would defeat the optimization.
  3. Is there a way to warm up the bloom filter and block index cache
  for
  a table? This is for a case where I always want the bloom filters
 and
   index
  to be all in memory, but not the data blocks themselves.
  4. This one is related to what I read in the HBase definitive guide
  bloom filter section
  Given a random row key you are looking for, it is very likely that
  this
  key will fall in between two block start keys. The only way for
 HBase
  to
  figure out if the key actually exists is by loading the block and
   scanning
  it to find the key.
  The above excerpt seems to imply to me that the search for key
 inside
  a
  block is linear and I feel I must be reading it wrong. I would
 expect
   the
  scan to be a binary search.
  
  
   Thanks in Advance,
   Pankaj
  
   --
  
  
   *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
  pan...@brightroll.com
  
   Pankaj Gupta | Software Engineer
  
   *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
  
  
   United States | Canada | United Kingdom | Germany
  
  
   We're hiring
  
 
 http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
   
   !
  
 




-- 


*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pan...@brightroll.com

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | 

Re: Questions about HBase

2013-06-04 Thread Anoop John
4. This one is related to what I read in the HBase definitive guide
   bloom filter section
   Given a random row key you are looking for, it is very likely that this
   key will fall in between two block start keys. The only way for HBase to
   figure out if the key actually exists is by loading the block and
scanning
   it to find the key.
   The above excerpt seems to imply to me that the search for key inside a
   block is linear and I feel I must be reading it wrong. I would expect the
   scan to be a binary search.

Yes as Ram said, using the RK the HFile data block where this key *might*
be present can be found out and the same is loaded and then we seek to
exact RK. This is a linear read.  You can take a look at Prefix Tree
encoder which is available in 95. This one tries to avoid this linear read
within a block.

On Wed, Jun 5, 2013 at 9:59 AM, Ted Yu yuzhih...@gmail.com wrote:

 bq. But i am not very sure if we can control the files getting selected for
 compaction in the older verisons.

 Same mechanism is available in 0.94

 Take a look
 at
 src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
 where you would find the following methods (and more):

   public void preCompactSelection(final
 ObserverContextRegionCoprocessorEnvironment c,
   final Store store, final ListStoreFile candidates, final
 CompactionRequest request)
   public InternalScanner
 preCompact(ObserverContextRegionCoprocessorEnvironment e,
   final Store store, final InternalScanner scanner) throws IOException
 {

 Cheers

 On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan 
 ramkrishna.s.vasude...@gmail.com wrote:

  Does Minor compaction remove HFiles in which all entries are out of
 TTL or does only Major compaction do that
  Yes it applies for Minor compactions.
  Is there a way of configuring major compaction to compact only files
 older than a certain time or to compress all the files except the
 latest
 few?
  In the latest trunk version the compaction algo itself can be plugged.
   There are some coprocessor hooks that gives control on the scanner that
  gets created for compaction with which we can control the KVs being
  selected. But i am not very sure if we can control the files getting
  selected for compaction in the older verisons.
   The above excerpt seems to imply to me that the search for key inside
 a
  block
  is linear and I feel I must be reading it wrong. I would expect the scan
 to
  be a binary search.
  Once the data block is identified for a key, we seek to the beginning of
  the block and then do a linear search until we reach the exact key that
 we
  are looking out for.  Because internally the data (KVs) are stored as
 byte
  buffers per block and it follows this pattern
  keylengthvaluelengthkeybytearrayvaluebytearray
  Is there a way to warm up the bloom filter and block index cache for
 a table?
  You always want the bloom and block index to be in cache?
 
 
  On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta pan...@brightroll.com
  wrote:
 
   Hi,
  
   I have a few small questions regarding HBase. I've searched the forum
 but
   couldn't find clear answers hence asking them here:
  
  
  1. Does Minor compaction remove HFiles in which all entries are out
 of
  TTL or does only Major compaction do that? I found this jira:
  https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know
 if
   the
  compaction being talked about there is minor or major.
  2. Is there a way of configuring major compaction to compact only
  files
  older than a certain time or to compress all the files except the
  latest
  few? We basically want to use the time based filtering optimization
 in
  HBase to get the latest additions to the table and since major
   compaction
  bunches everything into one file, it would defeat the optimization.
  3. Is there a way to warm up the bloom filter and block index cache
  for
  a table? This is for a case where I always want the bloom filters
 and
   index
  to be all in memory, but not the data blocks themselves.
  4. This one is related to what I read in the HBase definitive guide
  bloom filter section
  Given a random row key you are looking for, it is very likely that
  this
  key will fall in between two block start keys. The only way for
 HBase
  to
  figure out if the key actually exists is by loading the block and
   scanning
  it to find the key.
  The above excerpt seems to imply to me that the search for key
 inside
  a
  block is linear and I feel I must be reading it wrong. I would
 expect
   the
  scan to be a binary search.
  
  
   Thanks in Advance,
   Pankaj
  
   --
  
  
   *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
  pan...@brightroll.com
  
   Pankaj Gupta | Software Engineer
  
   *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
  
  
   United States | Canada | United Kingdom | Germany
  
  
   

Re: Questions about HBase

2013-06-04 Thread Asaf Mesika
If you will read HFile v2 document on HBase site you will understand
completely how the search for a record works and why there is linear search
in the block but binary search to get to the right block.
Also bear in mind the amount of keys in a blocks is not big since a block
in HFile by default is 65k, thus from a 10GB HFile you are only fully
scanning 65k out of it.

On Wednesday, June 5, 2013, Pankaj Gupta wrote:

 Thanks for the replies. I'll take a look at src/main/java/org/apache/
 hadoop/hbase/coprocessor/BaseRegionObserver.java.

 @ramkrishna: I do want to have bloom filter and block index all the time.
 For good read performance they're critical in my workflow. The worry is
 that when HBase is restarted it will take a long time for them to get
 populated again and performance will suffer. If there was a way of loading
 them quickly and warm up the table then we'll be able to restart HBase
 without causing slow down in processing.


 On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu yuzhih...@gmail.com wrote:

  bq. But i am not very sure if we can control the files getting selected
 for
  compaction in the older verisons.
 
  Same mechanism is available in 0.94
 
  Take a look
  at
  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
  where you would find the following methods (and more):
 
public void preCompactSelection(final
  ObserverContextRegionCoprocessorEnvironment c,
final Store store, final ListStoreFile candidates, final
  CompactionRequest request)
public InternalScanner
  preCompact(ObserverContextRegionCoprocessorEnvironment e,
final Store store, final InternalScanner scanner) throws
 IOException
  {
 
  Cheers
 
  On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan 
  ramkrishna.s.vasude...@gmail.com wrote:
 
   Does Minor compaction remove HFiles in which all entries are out of
  TTL or does only Major compaction do that
   Yes it applies for Minor compactions.
   Is there a way of configuring major compaction to compact only files
  older than a certain time or to compress all the files except the
  latest
  few?
   In the latest trunk version the compaction algo itself can be plugged.
There are some coprocessor hooks that gives control on the scanner
 that
   gets created for compaction with which we can control the KVs being
   selected. But i am not very sure if we can control the files getting
   selected for compaction in the older verisons.
The above excerpt seems to imply to me that the search for key
 inside
  a
   block
   is linear and I feel I must be reading it wrong. I would expect the
 scan
  to
   be a binary search.
   Once the data block is identified for a key, we seek to the beginning
 of
   the block and then do a linear search until we reach the exact key that
  we
   are looking out for.  Because internally the data (KVs) are stored as
  byte
   buffers per block and it follows this pattern
   keylengthvaluelengthkeybytearrayvaluebytearray
   Is there a way to warm up the bloom filter and block index cache for
  a table?
   You always want the bloom and block index to be in cache?
  
  
   On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta pan...@brightroll.com
   wrote:
  
Hi,
   
I have a few small questions regarding HBase. I've searched the forum
  but
couldn't find clear answers hence asking them here:
   
   
   1. Does Minor compaction remove HFiles in which all entries are
 out
  of
   TTL or does only Major compaction do that? I found this jira:
   https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know
  if
the
   compaction being talked about there is minor or major.
   2. Is there a way of configuring major compaction to compact only
   files
   older than a certain time or to compress all the files except the
   latest
   few? We basically want to use the time based filtering
 optimization
  in
   HBase to get the latest additions to the table and since major
compaction
   bunches everything into one file, it would defeat the
 optimization.
   3. Is there a way to warm up the bloom filter and block index
 cache
   for
   a table? This is for a case where I always want the bloom filters
  and
index
   to be all in memory, but not the


Re: Questions about HBase

2013-06-04 Thread ramkrishna vasudevan
for the question whether you will be able to do a warm up for the bloom and
block cache i don't think it is possible now.

Regards
Ram


On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika asaf.mes...@gmail.com wrote:

 If you will read HFile v2 document on HBase site you will understand
 completely how the search for a record works and why there is linear search
 in the block but binary search to get to the right block.
 Also bear in mind the amount of keys in a blocks is not big since a block
 in HFile by default is 65k, thus from a 10GB HFile you are only fully
 scanning 65k out of it.

 On Wednesday, June 5, 2013, Pankaj Gupta wrote:

  Thanks for the replies. I'll take a look at src/main/java/org/apache/
  hadoop/hbase/coprocessor/BaseRegionObserver.java.
 
  @ramkrishna: I do want to have bloom filter and block index all the time.
  For good read performance they're critical in my workflow. The worry is
  that when HBase is restarted it will take a long time for them to get
  populated again and performance will suffer. If there was a way of
 loading
  them quickly and warm up the table then we'll be able to restart HBase
  without causing slow down in processing.
 
 
  On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   bq. But i am not very sure if we can control the files getting selected
  for
   compaction in the older verisons.
  
   Same mechanism is available in 0.94
  
   Take a look
   at
  
 src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
   where you would find the following methods (and more):
  
 public void preCompactSelection(final
   ObserverContextRegionCoprocessorEnvironment c,
 final Store store, final ListStoreFile candidates, final
   CompactionRequest request)
 public InternalScanner
   preCompact(ObserverContextRegionCoprocessorEnvironment e,
 final Store store, final InternalScanner scanner) throws
  IOException
   {
  
   Cheers
  
   On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan 
   ramkrishna.s.vasude...@gmail.com wrote:
  
Does Minor compaction remove HFiles in which all entries are out of
   TTL or does only Major compaction do that
Yes it applies for Minor compactions.
Is there a way of configuring major compaction to compact only
 files
   older than a certain time or to compress all the files except the
   latest
   few?
In the latest trunk version the compaction algo itself can be
 plugged.
 There are some coprocessor hooks that gives control on the scanner
  that
gets created for compaction with which we can control the KVs being
selected. But i am not very sure if we can control the files getting
selected for compaction in the older verisons.
 The above excerpt seems to imply to me that the search for key
  inside
   a
block
is linear and I feel I must be reading it wrong. I would expect the
  scan
   to
be a binary search.
Once the data block is identified for a key, we seek to the beginning
  of
the block and then do a linear search until we reach the exact key
 that
   we
are looking out for.  Because internally the data (KVs) are stored as
   byte
buffers per block and it follows this pattern
keylengthvaluelengthkeybytearrayvaluebytearray
Is there a way to warm up the bloom filter and block index cache
 for
   a table?
You always want the bloom and block index to be in cache?
   
   
On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta pan...@brightroll.com
wrote:
   
 Hi,

 I have a few small questions regarding HBase. I've searched the
 forum
   but
 couldn't find clear answers hence asking them here:


1. Does Minor compaction remove HFiles in which all entries are
  out
   of
TTL or does only Major compaction do that? I found this jira:
https://issues.apache.org/jira/browse/HBASE-5199 but I dont'
 know
   if
 the
compaction being talked about there is minor or major.
2. Is there a way of configuring major compaction to compact
 only
files
older than a certain time or to compress all the files except
 the
latest
few? We basically want to use the time based filtering
  optimization
   in
HBase to get the latest additions to the table and since major
 compaction
bunches everything into one file, it would defeat the
  optimization.
3. Is there a way to warm up the bloom filter and block index
  cache
for
a table? This is for a case where I always want the bloom
 filters
   and
 index
to be all in memory, but not the



Re: Scan + Gets are disk bound

2013-06-04 Thread Anoop John
When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.



-Anoop-

On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran rahu...@yahoo.com wrote:

 Our row-keys do not contain time. By time-based scans, I mean, an MR over
 the Hbase table where the scan object has no startRow or endRow but has a
 startTime and endTime.

 Our row key format is MD5 of UUID+UUID, so, we expect good distribution.
 We have pre-split initially to prevent any initial hotspotting.
 ~Rahul.


 
  From: anil gupta anilgupt...@gmail.com
 To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, June 4, 2013 9:31 PM
 Subject: Re: Scan + Gets are disk bound








 On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com
 wrote:

 Hi,
 
 We are relatively new to Hbase, and we are hitting a roadblock on our
 scan performance. I searched through the email archives and applied a bunch
 of the recommendations there, but they did not improve much. So, I am
 hoping I am missing something which you could guide me towards. Thanks in
 advance.
 
 We are currently writing data and reading in an almost continuous mode
 (stream of data written into an HBase table and then we run a time-based MR
 on top of this Table). We currently were backed up and about 1.5 TB of data
 was loaded into the table and we began performing time-based scan MRs in 10
 minute time intervals(startTime and endTime interval is 10 minutes). Most
 of the 10 minute interval had about 100 GB of data to process.
 
 Our workflow was to primarily eliminate duplicates from this table. We
 have  maxVersions = 5 for the table. We use TableInputFormat to perform the
 time-based scan to ensure data locality. In the mapper, we check if there
 exists a previous version of the row in a time period earlier to the
 timestamp of the input row. If not, we emit that row.
 
 We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
 turned off block cache for this table with the expectation that the block
 index and bloom filter will be cached in the block cache. We expect
 duplicates to be rare and hence hope for most of these checks to be
 fulfilled by the bloom filter. Unfortunately, we notice very slow
 performance on account of being disk bound. Looking at jstack, we notice
 that most of the time, we appear to be hitting disk for the block index. We
 performed a major compaction and retried and performance improved some, but
 not by much. We are processing data at about 2 MB per second.
 
   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
 Anil: You dont have the right balance between disk,cpu and ram. You have
 too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
 have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
 to be the biggest reason of your problem.

 HBase is running with 30 GB Heap size, memstore values being capped at 3
 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
 heap size(15 GB). We are using SNAPPY for our tables.
 
 
 A couple of questions:
 * Is the performance of the time-based scan bad after a major
 compaction?
 
 Anil: In general, TimeBased(i am assuming you have built your rowkey on
 timestamp) scans are not good for HBase because of region hot-spotting.
 Have you tried setting the ScannerCaching to a higher number?


 * What can we do to help alleviate being disk bound? The typical
 answer of adding more RAM does not seem to have helped, or we are missing
 some other config
 
 Anil: Try adding more disks to your machines.


 
 
 Below are some of the metrics from a Regionserver webUI:
 
 requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
 numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
 totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
 memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
 readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
 flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
 blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
 blockCacheHitCount=2759, blockCacheMissCount=25373411,
 blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
 blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
 slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
 fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
 fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
 fsReadLatencyHistogram99th=100981301.2,
 fsReadLatencyHistogram999th=511591146.03,
  fsPreadLatencyHistogramMean=3895616.6,
 fsPreadLatencyHistogramCount=42, fsPreadLatencyHistogramMedian=954552,
 

Re: Scan + Gets are disk bound

2013-06-04 Thread Asaf Mesika
On Tuesday, June 4, 2013, Rahul Ravindran wrote:

 Hi,

 We are relatively new to Hbase, and we are hitting a roadblock on our scan
 performance. I searched through the email archives and applied a bunch of
 the recommendations there, but they did not improve much. So, I am hoping I
 am missing something which you could guide me towards. Thanks in advance.

 We are currently writing data and reading in an almost continuous mode
 (stream of data written into an HBase table and then we run a time-based MR
 on top of this Table). We currently were backed up and about 1.5 TB of data
 was loaded into the table and we began performing time-based scan MRs in 10
 minute time intervals(startTime and endTime interval is 10 minutes). Most
 of the 10 minute interval had about 100 GB of data to process.

 Our workflow was to primarily eliminate duplicates from this table. We
 have  maxVersions = 5 for the table. We use TableInputFormat to perform the
 time-based scan to ensure data locality. In the mapper, we check if there
 exists a previous version of the row in a time period earlier to the
 timestamp of the input row. If not, we emit that row.

If I understand correctly, for a rowkey R, column family F, column
qualifier C, if you have two values with time stamp 13:00 and 13:02, you
want to remove the value associated with 13:02.

The best way to do this is  to write a simple RegionObserver Coprocessor,
which hooks to the compaction process (preCompact for instance). In there
simply, for any given R, F, C only emit the earliest timestamp value (the
last, since timestamp is ordered descending), and that's it.
It's a very effective way, since you are riding on top of an existing
process which reads the values either way, so you are not paying the price
of reading it again your MR job.
Also, in between major compactions, you can also implement the preScan hook
in the region observer, so you'll pick up only the earliest timestamp
value, thus achieving the same result for your client, although you haven't
removed those values yet.

I've implemented this for counters delayed aggregations, and it works great
in production.




 We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
 turned off block cache for this table with the expectation that the block
 index and bloom filter will be cached in the block cache. We expect
 duplicates to be rare and hence hope for most of these checks to be
 fulfilled by the bloom filter. Unfortunately, we notice very slow
 performance on account of being disk bound. Looking at jstack, we notice
 that most of the time, we appear to be hitting disk for the block index. We
 performed a major compaction and retried and performance improved some, but
 not by much. We are processing data at about 2 MB per second.

   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
 HBase is running with 30 GB Heap size, memstore values being capped at 3 GB
 and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap
 size(15 GB). We are using SNAPPY for our tables.


 A couple of questions:
 * Is the performance of the time-based scan bad after a major
 compaction?

 * What can we do to help alleviate being disk bound? The typical
 answer of adding more RAM does not seem to have helped, or we are missing
 some other config



 Below are some of the metrics from a Regionserver webUI:

 requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
 numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
 totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
 memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
 readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
 flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
 blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
 blockCacheHitCount=2759, blockCacheMissCount=25373411,
 blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
 blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
 slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
 fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
 fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
 fsReadLatencyHistogram99th=100981301.2,
 fsReadLatencyHistogram999th=511591146.03,
  fsPreadLatencyHistogramMean=3895616.6,
 fsPreadLatencyHistogramCount=42, fsPreadLatencyHistogramMedian=954552,
 fsPreadLatencyHistogram75th=8723662.5,
 fsPreadLatencyHistogram95th=11159637.65,
 fsPreadLatencyHistogram99th=37763281.57,
 fsPreadLatencyHistogram999th=273192813.91,
 fsWriteLatencyHistogramMean=6124343.91,
 fsWriteLatencyHistogramCount=114, fsWriteLatencyHistogramMedian=374379,
 fsWriteLatencyHistogram75th=431395.75,
 fsWriteLatencyHistogram95th=576853.8,
 fsWriteLatencyHistogram99th=1034159.75,
 fsWriteLatencyHistogram999th=5687910.29



Re: Questions about HBase

2013-06-04 Thread Asaf Mesika
When you do the first read of this region, wouldn't this load all bloom
filters?



On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:

 for the question whether you will be able to do a warm up for the bloom and
 block cache i don't think it is possible now.

 Regards
 Ram


 On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika asaf.mes...@gmail.com
 wrote:

  If you will read HFile v2 document on HBase site you will understand
  completely how the search for a record works and why there is linear
 search
  in the block but binary search to get to the right block.
  Also bear in mind the amount of keys in a blocks is not big since a block
  in HFile by default is 65k, thus from a 10GB HFile you are only fully
  scanning 65k out of it.
 
  On Wednesday, June 5, 2013, Pankaj Gupta wrote:
 
   Thanks for the replies. I'll take a look at src/main/java/org/apache/
   hadoop/hbase/coprocessor/BaseRegionObserver.java.
  
   @ramkrishna: I do want to have bloom filter and block index all the
 time.
   For good read performance they're critical in my workflow. The worry is
   that when HBase is restarted it will take a long time for them to get
   populated again and performance will suffer. If there was a way of
  loading
   them quickly and warm up the table then we'll be able to restart HBase
   without causing slow down in processing.
  
  
   On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu yuzhih...@gmail.com wrote:
  
bq. But i am not very sure if we can control the files getting
 selected
   for
compaction in the older verisons.
   
Same mechanism is available in 0.94
   
Take a look
at
   
  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
where you would find the following methods (and more):
   
  public void preCompactSelection(final
ObserverContextRegionCoprocessorEnvironment c,
  final Store store, final ListStoreFile candidates, final
CompactionRequest request)
  public InternalScanner
preCompact(ObserverContextRegionCoprocessorEnvironment e,
  final Store store, final InternalScanner scanner) throws
   IOException
{
   
Cheers
   
On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan 
ramkrishna.s.vasude...@gmail.com wrote:
   
 Does Minor compaction remove HFiles in which all entries are out
 of
TTL or does only Major compaction do that
 Yes it applies for Minor compactions.
 Is there a way of configuring major compaction to compact only
  files
older than a certain time or to compress all the files except
 the
latest
few?
 In the latest trunk version the compaction algo itself can be
  plugged.
  There are some coprocessor hooks that gives control on the scanner
   that
 gets created for compaction with which we can control the KVs being
 selected. But i am not very sure if we can control the files
 getting
 selected for compaction in the older verisons.
  The above excerpt seems to imply to me that the search for key
   inside
a
 block
 is linear and I feel I must be reading it wrong. I would expect the
   scan
to
 be a binary search.
 Once the data block is identified for a key, we seek to the
 beginning
   of
 the block and then do a linear search until we reach the exact key
  that
we
 are looking out for.  Because internally the data (KVs) are stored
 as
byte
 buffers per block and it follows this pattern
 keylengthvaluelengthkeybytearrayvaluebytearray
 Is there a way to warm up the bloom filter and block index cache
  for
a table?
 You always want the bloom and block index to be in cache?


 On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta 
 pan...@brightroll.com
 wrote:

  Hi,
 
  I have a few small questions regarding HBase. I've searched the
  forum
but
  couldn't find clear answers hence asking them here:
 
 
 1. Does Minor compaction remove HFiles in which all entries
 are
   out
of
 TTL or does only Major compaction do that? I found this jira:
 https://issues.apache.org/jira/browse/HBASE-5199 but I dont'
  know
if
  the
 compaction being talked about there is minor or major.
 2. Is there a way of configuring major compaction to compact
  only
 files
 older than a certain time or to compress all the files except
  the
 latest
 few? We basically want to use the time based filtering
   optimization
in
 HBase to get the latest additions to the table and since major
  compaction
 bunches everything into one file, it would defeat the
   optimization.
 3. Is there a way to warm up the bloom filter and block index
   cache
 for
 a table? This is for a case where I always want the bloom
  filters
and
  index
 to be all in memory, but not the
 



Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Thanks for that confirmation. This is what we hypothesized as well.

So, if we are dependent on timerange scans, we need to completely avoid major 
compaction and depend only on minor compactions? Is there any downside? We do 
have a TTL set on all the rows in the table.
~Rahul.



 From: Anoop John anoop.hb...@gmail.com
To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com 
Cc: anil gupta anilgupt...@gmail.com 
Sent: Tuesday, June 4, 2013 10:44 PM
Subject: Re: Scan + Gets are disk bound
 

When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.



-Anoop-

On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran rahu...@yahoo.com wrote:

 Our row-keys do not contain time. By time-based scans, I mean, an MR over
 the Hbase table where the scan object has no startRow or endRow but has a
 startTime and endTime.

 Our row key format is MD5 of UUID+UUID, so, we expect good distribution.
 We have pre-split initially to prevent any initial hotspotting.
 ~Rahul.


 
  From: anil gupta anilgupt...@gmail.com
 To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, June 4, 2013 9:31 PM
 Subject: Re: Scan + Gets are disk bound








 On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com
 wrote:

 Hi,
 
 We are relatively new to Hbase, and we are hitting a roadblock on our
 scan performance. I searched through the email archives and applied a bunch
 of the recommendations there, but they did not improve much. So, I am
 hoping I am missing something which you could guide me towards. Thanks in
 advance.
 
 We are currently writing data and reading in an almost continuous mode
 (stream of data written into an HBase table and then we run a time-based MR
 on top of this Table). We currently were backed up and about 1.5 TB of data
 was loaded into the table and we began performing time-based scan MRs in 10
 minute time intervals(startTime and endTime interval is 10 minutes). Most
 of the 10 minute interval had about 100 GB of data to process.
 
 Our workflow was to primarily eliminate duplicates from this table. We
 have  maxVersions = 5 for the table. We use TableInputFormat to perform the
 time-based scan to ensure data locality. In the mapper, we check if there
 exists a previous version of the row in a time period earlier to the
 timestamp of the input row. If not, we emit that row.
 
 We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
 turned off block cache for this table with the expectation that the block
 index and bloom filter will be cached in the block cache. We expect
 duplicates to be rare and hence hope for most of these checks to be
 fulfilled by the bloom filter. Unfortunately, we notice very slow
 performance on account of being disk bound. Looking at jstack, we notice
 that most of the time, we appear to be hitting disk for the block index. We
 performed a major compaction and retried and performance improved some, but
 not by much. We are processing data at about 2 MB per second.
 
   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
 Anil: You dont have the right balance between disk,cpu and ram. You have
 too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
 have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
 to be the biggest reason of your problem.

 HBase is running with 30 GB Heap size, memstore values being capped at 3
 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
 heap size(15 GB). We are using SNAPPY for our tables.
 
 
 A couple of questions:
         * Is the performance of the time-based scan bad after a major
 compaction?
 
 Anil: In general, TimeBased(i am assuming you have built your rowkey on
 timestamp) scans are not good for HBase because of region hot-spotting.
 Have you tried setting the ScannerCaching to a higher number?


         * What can we do to help alleviate being disk bound? The typical
 answer of adding more RAM does not seem to have helped, or we are missing
 some other config
 
 Anil: Try adding more disks to your machines.


 
 
 Below are some of the metrics from a Regionserver webUI:
 
 requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
 numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
 totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
 memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
 readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
 flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
 blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
 blockCacheHitCount=2759, blockCacheMissCount=25373411,