Re: Hbase random read performance

2013-07-08 Thread Ted Yu
Moving to HBase user mailing list. 

Can you upgrade to newer release such as 0.94.8 ?

Cheers

On Jul 8, 2013, at 4:36 AM, Boris Emelyanov emelya...@post.km.ru wrote:

 I'm trying to configure hbase for fully random read performance, my cluster 
 parameters are:
 
 9 servers as slaves, each has two 1TB HDD as hadoop volumes;
 data:  800 millions 6-10 KB objects in hbase;
 
 HBase Version - 0.90.6-cdh3u5
 HBASE_HEAPSIZE=12288;
 hfile.block.cache.size = 0.4
 hfile.min.blocksize.size = 16384
 hbase.regionserver.handler.count = 100
 
 I used several recommended tuning solutions, such as
 * tunrning on bloom filters;
 * decreasing hbase block size to 16384.
 
 However, read performance is still poor, about 800-1000rps.
 
 What would you recommend to solve the problem?
 
 -- 
 Best regards,
 
 Boris.
 


Re: optimizing block cache requests + eviction

2013-07-08 Thread Ted Yu
For suggestion #3 below, take a look at:

HBASE-7509 Enable RS to query a secondary datanode in parallel, if the
primary takes too long

Cheers

On Mon, Jul 8, 2013 at 3:04 AM, Viral Bajaria viral.baja...@gmail.comwrote:

 Hi,

 TL;DR;
 Trying to make a case for making the block eviction strategy smart and to
 not evict remote blocks more frequently and make the requests more smarter.

 The question here comes after I debugged the issue that I was having with
 random region servers hitting high load averages. I initially thought the
 problem was hardware related i.e. bad disk or network since the wait I/O
 was too high but it was a combination of things.

 I figured with SCR (short circuit read) ON the datanode should almost never
 show high amount of block requests from the local regionservers. So my
 starting point for debugging was the datanode since it was doing a ton of
 I/O. The clienttrace logs helped me figure out which RS nodes were making
 block requests. I hacked up a script to report which blocks are being
 requested and how many times per minute. I found that some blocks were
 being requested 10+ times in a minute and over 2000 times in an hour from
 the same regionserver. This was causing the server to do 40+MB/s on reads
 alone. That was on the higher side, the average was closer to 100 or so per
 hour.

 Now why did I end up in such a situation. It happened due to the fact that
 I added servers to the cluster and rebalanced the cluster. At the same time
 I added some drives and also removed the offending server in my setup. This
 caused some of the data to not be co-located with the regionservers. Given
 that major_compaction was disabled and it would not have run for a while
 (atleast on some tables) these block requests would not go away. One of my
 regionservers was totally overwhelmed. I made the situation worse when I
 removed the server that was under heavy load with the assumption that it's
 a hardware problem with the box without doing a deep dive (doh!). Given
 that regionservers will be added in the future, I expect block locality to
 go down till major_compaction runs. Also nodes can go down and cause this
 problem. So I started thinking of probable solutions, but first some
 observations.

 *Observations/Comments*
 - The surprising part was the regionservers were trying to make so many
 requests for the same block in the same minute (let alone hour). Could this
 happen because the original request took a few seconds and so the
 regionserver re-requested ? I didn't see any fetch errors in the
 regionserver logs for blocks.
 - Even more strange; my heap size was at 11G and the time when this was
 happening, the used heap was at 2-4G. I would have expected the heap to
 grow higher than that since the blockCache should be using atleast 40% of
 the available heap space.
 - Another strange thing that I observed was, the block was being requested
 from the same datanode every single time.

 *Possible Solution/Changes*
 - Would it make sense to give remote blocks higher priority over the local
 blocks that can be read via SCR and not let them get evicted if there is a
 tie in which block to evict ?
 - Should we throttle the number of outgoing requests for a block ? I am not
 sure if my firewall caused some issue but I wouldn't expect multiple block
 fetch requests in the same minute. I did see a few RST packets getting
 dropped at the firewall but I wasn't able to trace the problem was due to
 this.
 - We have 3 replicas available, shouldn't we request from the other
 datanode if one might take a lot of time ? The amount of time it took to
 read a block went up when the box was under heavy load, yet the re-requests
 were going to the same one. Is this something that is available on the
 DFSClient and can we exploit it ?
 - Is it possible to migrate a region to a server which has higher number of
 blocks available for it ? We don't need to make this automatic, but we
 could provide a command that could be invoked manually to assign a region
 to a specific regionserver. Thoughts ?

 Thanks,
 Viral



Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Jason Huang
Hello,

I am trying to get some advice on pros/cons of using separator/delimiter as
part of HBase row key.

Currently one of our user activity tables has a rowkey design of
UserID^TimeStamp with a separator of ^. (UserID is a string that won't
include '^').

This is designed for the two common use cases in our system:
(1) If we come from a context where the UserID is known, we can do a scan
easily for all the user activities with a startRowKey and stopRowKey.
(2) If we come from a external networked table where the row key of this
user activity table is stored and can be retrieved as activityRowKey, then
we can use the following code to parse out the UserID and do the same scan
as in (1):

String activityRowKeyStr = Bytes.toString(activityRowKey);
String userId =
activityRowKeyStr.subString(activityRowKeyStr.indexOf(^)+1)

Then I can set startRowKey and stopRowKey for the scan based on userId.
Here we get benefit of having the User ID as part of the row key with the
separator (comparing to another solution that stores the userID as one of
the columns in the user activity table).

The reason I pick a separator after UserID is that sometimes we may not get
a fixed length string of the UserID value. At one point I actually thought
of using MD5 to hash the UserID and make it a fixed length, however, the
possibility of collision and possible overhead of applying the hash
function makes me pick the separator ^.

My question:
(1) I kind of make the argument that using a separator is kind of better
than using a MD5 hash value. Does that seem reasonable? Could you comments
on other pros and cons that I might miss (as the bases for my argument)?

(2) On using a separator/delimiter, besides the requirements that this
separator/delimiter shouldn't appear elsewhere in the rowkey, are there any
other requirements? Are there any special separator/delimiters that are
better/worse than the average ones?

thanks!

Jason


Re: Bulk loading HFiles via LoadIncrementalHFiles fails at a region that is being compacted, a bug?

2013-07-08 Thread Stanislav Barton
Hello Michael,

looking in the code, it seems to me that the 60s is hardcoded, however
it retries for, on default, 10 times, so in total 10 minutes wait
time, I upped that to 20 times, so now it is 20 minutes for me, but
still, we have some pretty big regions whose compaction (which was the
case in particular) can take more than 40 minutes. I have split the
big regions to alleviate this, so getting a thread dump now will be
difficult (this is in production so no problems is the point). Anyway,
looking on the code, for me its hard to figure out which actions will
block the lock from succeeding on the region at the place I indicated,
so was hoping for an answer from an expert. If the compaction blocks
the lock, it might be that at unit testing the compactions are faster
than 10 minutes so the problem never exhibits. Is this the case?

Stan


Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Mike Axiak
Hello Jason,

Have you considered the following rowkey?

  murmur_128(userId) + timestamp + userId ?

This handles both of your cases as (1) murmur 128 is much faster than
md5 so will have very low overhead and (2) the userid at the end of
the key will ensure that no murmur collisions will cause issues. This
key also handle incrementing userIds well because close userIds will
likely be in separate regions.

Cheers,
Mike

On Mon, Jul 8, 2013 at 10:19 AM, Jason Huang jason.hu...@icare.com wrote:
 Hello,

 I am trying to get some advice on pros/cons of using separator/delimiter as
 part of HBase row key.

 Currently one of our user activity tables has a rowkey design of
 UserID^TimeStamp with a separator of ^. (UserID is a string that won't
 include '^').

 This is designed for the two common use cases in our system:
 (1) If we come from a context where the UserID is known, we can do a scan
 easily for all the user activities with a startRowKey and stopRowKey.
 (2) If we come from a external networked table where the row key of this
 user activity table is stored and can be retrieved as activityRowKey, then
 we can use the following code to parse out the UserID and do the same scan
 as in (1):

 String activityRowKeyStr = Bytes.toString(activityRowKey);
 String userId =
 activityRowKeyStr.subString(activityRowKeyStr.indexOf(^)+1)

 Then I can set startRowKey and stopRowKey for the scan based on userId.
 Here we get benefit of having the User ID as part of the row key with the
 separator (comparing to another solution that stores the userID as one of
 the columns in the user activity table).

 The reason I pick a separator after UserID is that sometimes we may not get
 a fixed length string of the UserID value. At one point I actually thought
 of using MD5 to hash the UserID and make it a fixed length, however, the
 possibility of collision and possible overhead of applying the hash
 function makes me pick the separator ^.

 My question:
 (1) I kind of make the argument that using a separator is kind of better
 than using a MD5 hash value. Does that seem reasonable? Could you comments
 on other pros and cons that I might miss (as the bases for my argument)?

 (2) On using a separator/delimiter, besides the requirements that this
 separator/delimiter shouldn't appear elsewhere in the rowkey, are there any
 other requirements? Are there any special separator/delimiters that are
 better/worse than the average ones?

 thanks!

 Jason


Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Shahab Yunus
Not saying this is a solution or better in anyway but just more food for
thought. Is there any maximum size limit for UserIds? You can pad also for
Users Ids of smaller length. You are using more space in this way though.
It can help in sorting as well.

Regards,
Shahab


On Mon, Jul 8, 2013 at 10:19 AM, Jason Huang jason.hu...@icare.com wrote:

 Hello,

 I am trying to get some advice on pros/cons of using separator/delimiter as
 part of HBase row key.

 Currently one of our user activity tables has a rowkey design of
 UserID^TimeStamp with a separator of ^. (UserID is a string that won't
 include '^').

 This is designed for the two common use cases in our system:
 (1) If we come from a context where the UserID is known, we can do a scan
 easily for all the user activities with a startRowKey and stopRowKey.
 (2) If we come from a external networked table where the row key of this
 user activity table is stored and can be retrieved as activityRowKey, then
 we can use the following code to parse out the UserID and do the same scan
 as in (1):

 String activityRowKeyStr = Bytes.toString(activityRowKey);
 String userId =
 activityRowKeyStr.subString(activityRowKeyStr.indexOf(^)+1)

 Then I can set startRowKey and stopRowKey for the scan based on userId.
 Here we get benefit of having the User ID as part of the row key with the
 separator (comparing to another solution that stores the userID as one of
 the columns in the user activity table).

 The reason I pick a separator after UserID is that sometimes we may not get
 a fixed length string of the UserID value. At one point I actually thought
 of using MD5 to hash the UserID and make it a fixed length, however, the
 possibility of collision and possible overhead of applying the hash
 function makes me pick the separator ^.

 My question:
 (1) I kind of make the argument that using a separator is kind of better
 than using a MD5 hash value. Does that seem reasonable? Could you comments
 on other pros and cons that I might miss (as the bases for my argument)?

 (2) On using a separator/delimiter, besides the requirements that this
 separator/delimiter shouldn't appear elsewhere in the rowkey, are there any
 other requirements? Are there any special separator/delimiters that are
 better/worse than the average ones?

 thanks!

 Jason



Re: When to expand vertically vs. horizontally in Hbase

2013-07-08 Thread Michael Segel
Ian, 

You still want to stick to your relational modeling.  :-(

You need to play around more with hierarchical models to get a better 
appreciation. 

If you model as if you're working with a RDBMS then you will end up with a poor 
HBase table design. 

In ERD models, you don't have the concept of a weak relationship. 
The weak relationship is that the model has no relationship between the 
entities. Its the application that manages that. 

Imagine a reference or look up table that in the model has no association. 
Using our example of an Order Entry system, its the application that hits the 
customer lookup table to capture relevant information for the order.  That's 
why I refer to it as a weak association. 



On Jul 5, 2013, at 6:00 PM, Ian Varley ivar...@salesforce.com wrote:

 Sure. Maybe it's useful to talk about the functional aspect of relationships 
 in models. In an RDBMS, explicit relationship play a couple roles:
 
 - foreign key constraints: don't allow a tuple in relation A to point to a 
 row in relation B that doesn't exist
 - join optimization - knowledge of how two relations are logically connected 
 can help perform joins in a more optimal way
 
 HBase, of course, provides neither of these features out of the box, so there 
 is no difference between an implied (weakly coupled, to use your term) 
 relationship and something stronger. 
 
 Where it gets interesting is in the kind of denormalization you're talking 
 about, where information that properly belongs to one entity is copied into 
 another one for efficiency's sake, or to get some kind of atomicity 
 protection. Your scenario below is doing this (duplicating customer info in 
 the order records). 
 
 To be fair, relational DBs also force this kind of behavior sometimes, again 
 for efficiency reasons (we've all done it). HBase just starts there. :)
 
 Ian
 
 On Jul 5, 2013, at 4:22 PM, Michael Segel michael_se...@hotmail.com wrote:
 
 An entity is an entity. 
 When you couple them you are saying that there's a relationship to them in 
 the model. 
 
 What I am saying is that you can have an HBase model which is not a single 
 table, however when you look at your use case, you are querying data from a 
 single table at a time. 
 
 Going back to the order entry system. You may have a customer table which 
 maintains all of the information about your customer yet you will also 
 duplicate portions of the data in to the order system.  You still have other 
 entities such as your orders, pick slips, shipping and invoices. There won't 
 be a hard or strong relationship between the customer table and the order 
 table. 
 
 When you go to your ERD tool, you wouldn't show a strong coupling of the 
 data. 
 
 Does that make sense? 
 
 On Jul 5, 2013, at 1:56 PM, Ian Varley ivar...@salesforce.com wrote:
 
 Mike, what do you mean by you can have entities, except that they are not 
 coupled? You mean, they have no relationship to each other? Or the 
 relationship is defined elsewhere (e.g. application code)? The concept of 
 coupling seems a little overloaded and not as concise here as 
 relationship. Two tuples in a database can have a wide number of 
 relationships to each other; the kinds of relationships that are actively 
 supported differs between a traditional RDBMS and HBase, and proper HBase 
 design requires understand these limitations precisely.
 
 I'm not trying to be an 
 ERhttp://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model 
 apologist, there are a lot of ways in which it sucks. :) But if we want to 
 evolve, we can't just pretend there's no history here to build on.
 
 Ian
 
 On Jul 5, 2013, at 1:41 PM, Michael Segel wrote:
 
 LOL...
 
 Ian wrote:
 But, something just occurred to me: just because your physical 
 implementation (HBase) doesn't support normalized entities and 
 relationships doesn't mean your *problem* doesn't have entities and 
 relationships. :) An Author is one entity, a Title is another, and a Genre 
 is a third. Understanding how they interact is a prerequisite for 
 translating into a physical model that works well in HBase. (ERD modeling 
 is not categorically the only way to understand that, but I've yet to hear 
 a credible alternative that doesn't boil down to either ERD or do it in 
 your head).
 
 
 You can have entities, except that they are not coupled.
 
 If you have a common key, then you may have a use for column families, it 
 just depends on your data and how you access your data.
 
 Its not rocket science, but its a non-trivial matter. Not doing it right 
 may mean that you are not going to get the most out of your system.
 
 
 On Jul 5, 2013, at 1:26 PM, Ian Varley 
 ivar...@salesforce.commailto:ivar...@salesforce.com wrote:
 
 But, something just occurred to me: just because your physical 
 implementation (HBase) doesn't support normalized entities and 
 relationships doesn't mean your *problem* doesn't have entities and 
 relationships. :) An Author is one entity, a Title is 

Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Michael Segel
Is murmur part of the standard java libraries? 

If not, you end up having to do a bit more maintenance of your cluster and 
that's going to be part of your tradeoff. 

On Jul 8, 2013, at 10:14 AM, Mike Axiak m...@axiak.net wrote:

 Hello Jason,
 
 Have you considered the following rowkey?
 
  murmur_128(userId) + timestamp + userId ?
 
 This handles both of your cases as (1) murmur 128 is much faster than
 md5 so will have very low overhead and (2) the userid at the end of
 the key will ensure that no murmur collisions will cause issues. This
 key also handle incrementing userIds well because close userIds will
 likely be in separate regions.
 
 Cheers,
 Mike
 
 On Mon, Jul 8, 2013 at 10:19 AM, Jason Huang jason.hu...@icare.com wrote:
 Hello,
 
 I am trying to get some advice on pros/cons of using separator/delimiter as
 part of HBase row key.
 
 Currently one of our user activity tables has a rowkey design of
 UserID^TimeStamp with a separator of ^. (UserID is a string that won't
 include '^').
 
 This is designed for the two common use cases in our system:
 (1) If we come from a context where the UserID is known, we can do a scan
 easily for all the user activities with a startRowKey and stopRowKey.
 (2) If we come from a external networked table where the row key of this
 user activity table is stored and can be retrieved as activityRowKey, then
 we can use the following code to parse out the UserID and do the same scan
 as in (1):
 
String activityRowKeyStr = Bytes.toString(activityRowKey);
String userId =
 activityRowKeyStr.subString(activityRowKeyStr.indexOf(^)+1)
 
 Then I can set startRowKey and stopRowKey for the scan based on userId.
 Here we get benefit of having the User ID as part of the row key with the
 separator (comparing to another solution that stores the userID as one of
 the columns in the user activity table).
 
 The reason I pick a separator after UserID is that sometimes we may not get
 a fixed length string of the UserID value. At one point I actually thought
 of using MD5 to hash the UserID and make it a fixed length, however, the
 possibility of collision and possible overhead of applying the hash
 function makes me pick the separator ^.
 
 My question:
 (1) I kind of make the argument that using a separator is kind of better
 than using a MD5 hash value. Does that seem reasonable? Could you comments
 on other pros and cons that I might miss (as the bases for my argument)?
 
 (2) On using a separator/delimiter, besides the requirements that this
 separator/delimiter shouldn't appear elsewhere in the rowkey, are there any
 other requirements? Are there any special separator/delimiters that are
 better/worse than the average ones?
 
 thanks!
 
 Jason
 



Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Mike Axiak
On Mon, Jul 8, 2013 at 11:29 AM, Michael Segel
michael_se...@hotmail.com wrote:
 If not, you end up having to do a bit more maintenance of your cluster and 
 that's going to be part of your tradeoff.

How so?

-Mike


Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Ted Yu
In 0.94, we have src/main/java/org/apache/hadoop/hbase/util/MurmurHash.java

For hadoop 1, there is src/core/org/apache/hadoop/util/hash/MurmurHash.java

Cheers

On Mon, Jul 8, 2013 at 8:29 AM, Michael Segel michael_se...@hotmail.comwrote:

 Is murmur part of the standard java libraries?

 If not, you end up having to do a bit more maintenance of your cluster and
 that's going to be part of your tradeoff.

 On Jul 8, 2013, at 10:14 AM, Mike Axiak m...@axiak.net wrote:

  Hello Jason,
 
  Have you considered the following rowkey?
 
   murmur_128(userId) + timestamp + userId ?
 
  This handles both of your cases as (1) murmur 128 is much faster than
  md5 so will have very low overhead and (2) the userid at the end of
  the key will ensure that no murmur collisions will cause issues. This
  key also handle incrementing userIds well because close userIds will
  likely be in separate regions.
 
  Cheers,
  Mike
 
  On Mon, Jul 8, 2013 at 10:19 AM, Jason Huang jason.hu...@icare.com
 wrote:
  Hello,
 
  I am trying to get some advice on pros/cons of using
 separator/delimiter as
  part of HBase row key.
 
  Currently one of our user activity tables has a rowkey design of
  UserID^TimeStamp with a separator of ^. (UserID is a string that
 won't
  include '^').
 
  This is designed for the two common use cases in our system:
  (1) If we come from a context where the UserID is known, we can do a
 scan
  easily for all the user activities with a startRowKey and stopRowKey.
  (2) If we come from a external networked table where the row key of this
  user activity table is stored and can be retrieved as activityRowKey,
 then
  we can use the following code to parse out the UserID and do the same
 scan
  as in (1):
 
 String activityRowKeyStr = Bytes.toString(activityRowKey);
 String userId =
  activityRowKeyStr.subString(activityRowKeyStr.indexOf(^)+1)
 
  Then I can set startRowKey and stopRowKey for the scan based on userId.
  Here we get benefit of having the User ID as part of the row key with
 the
  separator (comparing to another solution that stores the userID as one
 of
  the columns in the user activity table).
 
  The reason I pick a separator after UserID is that sometimes we may not
 get
  a fixed length string of the UserID value. At one point I actually
 thought
  of using MD5 to hash the UserID and make it a fixed length, however, the
  possibility of collision and possible overhead of applying the hash
  function makes me pick the separator ^.
 
  My question:
  (1) I kind of make the argument that using a separator is kind of better
  than using a MD5 hash value. Does that seem reasonable? Could you
 comments
  on other pros and cons that I might miss (as the bases for my argument)?
 
  (2) On using a separator/delimiter, besides the requirements that this
  separator/delimiter shouldn't appear elsewhere in the rowkey, are there
 any
  other requirements? Are there any special separator/delimiters that are
  better/worse than the average ones?
 
  thanks!
 
  Jason
 




Re: optimizing block cache requests + eviction

2013-07-08 Thread Andrew Purtell
 Would it make sense to give remote blocks higher priority over the local
blocks that can be read via SCR and not let them get evicted if there is a
tie in which block to evict ?

That sounds like a reasonable idea.  As are the others.

But first, could this be a bug?

What version of HBase? Were you able to take and save stack traces from the
offending RegionServers while they were engaged in this behavior? (If not,
maybe next time?) What were the HBase clients doing at the time that might
correlate? Are there a set of steps that can be distilled that tend
to trigger what you are seeing?


On Monday, July 8, 2013, Viral Bajaria wrote:

 Hi,

 TL;DR;
 Trying to make a case for making the block eviction strategy smart and to
 not evict remote blocks more frequently and make the requests more smarter.

 The question here comes after I debugged the issue that I was having with
 random region servers hitting high load averages. I initially thought the
 problem was hardware related i.e. bad disk or network since the wait I/O
 was too high but it was a combination of things.

 I figured with SCR (short circuit read) ON the datanode should almost never
 show high amount of block requests from the local regionservers. So my
 starting point for debugging was the datanode since it was doing a ton of
 I/O. The clienttrace logs helped me figure out which RS nodes were making
 block requests. I hacked up a script to report which blocks are being
 requested and how many times per minute. I found that some blocks were
 being requested 10+ times in a minute and over 2000 times in an hour from
 the same regionserver. This was causing the server to do 40+MB/s on reads
 alone. That was on the higher side, the average was closer to 100 or so per
 hour.

 Now why did I end up in such a situation. It happened due to the fact that
 I added servers to the cluster and rebalanced the cluster. At the same time
 I added some drives and also removed the offending server in my setup. This
 caused some of the data to not be co-located with the regionservers. Given
 that major_compaction was disabled and it would not have run for a while
 (atleast on some tables) these block requests would not go away. One of my
 regionservers was totally overwhelmed. I made the situation worse when I
 removed the server that was under heavy load with the assumption that it's
 a hardware problem with the box without doing a deep dive (doh!). Given
 that regionservers will be added in the future, I expect block locality to
 go down till major_compaction runs. Also nodes can go down and cause this
 problem. So I started thinking of probable solutions, but first some
 observations.

 *Observations/Comments*
 - The surprising part was the regionservers were trying to make so many
 requests for the same block in the same minute (let alone hour). Could this
 happen because the original request took a few seconds and so the
 regionserver re-requested ? I didn't see any fetch errors in the
 regionserver logs for blocks.
 - Even more strange; my heap size was at 11G and the time when this was
 happening, the used heap was at 2-4G. I would have expected the heap to
 grow higher than that since the blockCache should be using atleast 40% of
 the available heap space.
 - Another strange thing that I observed was, the block was being requested
 from the same datanode every single time.

 *Possible Solution/Changes*
 - Would it make sense to give remote blocks higher priority over the local
 blocks that can be read via SCR and not let them get evicted if there is a
 tie in which block to evict ?
 - Should we throttle the number of outgoing requests for a block ? I am not
 sure if my firewall caused some issue but I wouldn't expect multiple block
 fetch requests in the same minute. I did see a few RST packets getting
 dropped at the firewall but I wasn't able to trace the problem was due to
 this.
 - We have 3 replicas available, shouldn't we request from the other
 datanode if one might take a lot of time ? The amount of time it took to
 read a block went up when the box was under heavy load, yet the re-requests
 were going to the same one. Is this something that is available on the
 DFSClient and can we exploit it ?
 - Is it possible to migrate a region to a server which has higher number of
 blocks available for it ? We don't need to make this automatic, but we
 could provide a command that could be invoked manually to assign a region
 to a specific regionserver. Thoughts ?

 Thanks,
 Viral



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Michael Segel
You will need to put the jar into either every app that runs, or you will need 
to put it on every node. Every upgrade, you will need to make sure its still in 
your class path. 

More work for the admins. So how much faster is it over MD5? MD5 and SHA-1 are 
part of the Java libraries that ship w Sun/Oracle so you have them already 
installed and in your class path. 

Just saying... ;-) 

On Jul 8, 2013, at 10:36 AM, Mike Axiak m...@axiak.net wrote:

 On Mon, Jul 8, 2013 at 11:29 AM, Michael Segel
 michael_se...@hotmail.com wrote:
 If not, you end up having to do a bit more maintenance of your cluster and 
 that's going to be part of your tradeoff.
 
 How so?
 
 -Mike
 



Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Ted Yu
http://jsperf.com/murmur3-performance might be related.

On Mon, Jul 8, 2013 at 8:54 AM, Michael Segel michael_se...@hotmail.comwrote:

 You will need to put the jar into either every app that runs, or you will
 need to put it on every node. Every upgrade, you will need to make sure its
 still in your class path.

 More work for the admins. So how much faster is it over MD5? MD5 and SHA-1
 are part of the Java libraries that ship w Sun/Oracle so you have them
 already installed and in your class path.

 Just saying... ;-)

 On Jul 8, 2013, at 10:36 AM, Mike Axiak m...@axiak.net wrote:

  On Mon, Jul 8, 2013 at 11:29 AM, Michael Segel
  michael_se...@hotmail.com wrote:
  If not, you end up having to do a bit more maintenance of your cluster
 and that's going to be part of your tradeoff.
 
  How so?
 
  -Mike
 




Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Mike Axiak
I just don't understand.. every creation/interpretation of the key is
going to require code to be used. The murmur implementation is with
that code. How is there any extra burden?

On Mon, Jul 8, 2013 at 11:54 AM, Michael Segel
michael_se...@hotmail.com wrote:
 You will need to put the jar into either every app that runs, or you will 
 need to put it on every node. Every upgrade, you will need to make sure its 
 still in your class path.

 More work for the admins. So how much faster is it over MD5? MD5 and SHA-1 
 are part of the Java libraries that ship w Sun/Oracle so you have them 
 already installed and in your class path.

 Just saying... ;-)

 On Jul 8, 2013, at 10:36 AM, Mike Axiak m...@axiak.net wrote:

 On Mon, Jul 8, 2013 at 11:29 AM, Michael Segel
 michael_se...@hotmail.com wrote:
 If not, you end up having to do a bit more maintenance of your cluster and 
 that's going to be part of your tradeoff.

 How so?

 -Mike




Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Michael Segel
Where is murmur? 

In your app? 
So then every app that wants to fetch that row must now use murmur. 

Added to Hadoop/HBase? 
Then when you do upgrades you have to make sure that the package is still in 
your class path. Note that different vendor's release management will mean YMMV 
as to what happens to your class paths and set up or if the jar gets blown out 
of the directory. 


Is the added cost in maintenance worth it? 

I don't know but I seriously doubt it. 


On Jul 8, 2013, at 11:00 AM, Mike Axiak m...@axiak.net wrote:

 I just don't understand.. every creation/interpretation of the key is
 going to require code to be used. The murmur implementation is with
 that code. How is there any extra burden?
 
 On Mon, Jul 8, 2013 at 11:54 AM, Michael Segel
 michael_se...@hotmail.com wrote:
 You will need to put the jar into either every app that runs, or you will 
 need to put it on every node. Every upgrade, you will need to make sure its 
 still in your class path.
 
 More work for the admins. So how much faster is it over MD5? MD5 and SHA-1 
 are part of the Java libraries that ship w Sun/Oracle so you have them 
 already installed and in your class path.
 
 Just saying... ;-)
 
 On Jul 8, 2013, at 10:36 AM, Mike Axiak m...@axiak.net wrote:
 
 On Mon, Jul 8, 2013 at 11:29 AM, Michael Segel
 michael_se...@hotmail.com wrote:
 If not, you end up having to do a bit more maintenance of your cluster and 
 that's going to be part of your tradeoff.
 
 How so?
 
 -Mike
 
 
 



RE: optimizing block cache requests + eviction

2013-07-08 Thread Vladimir Rodionov
Viral,

From what you described here I can conclude that either:

1. your data access pattern is random and uniform (no data locality at all)
2. or you trash block cache with scan operations with block cache enabled.

or both.

but the idea of treating local and remote blocks differently is very good. +1.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Viral Bajaria [viral.baja...@gmail.com]
Sent: Monday, July 08, 2013 3:04 AM
To: user@hbase.apache.org
Subject: optimizing block cache requests + eviction

Hi,

TL;DR;
Trying to make a case for making the block eviction strategy smart and to
not evict remote blocks more frequently and make the requests more smarter.

The question here comes after I debugged the issue that I was having with
random region servers hitting high load averages. I initially thought the
problem was hardware related i.e. bad disk or network since the wait I/O
was too high but it was a combination of things.

I figured with SCR (short circuit read) ON the datanode should almost never
show high amount of block requests from the local regionservers. So my
starting point for debugging was the datanode since it was doing a ton of
I/O. The clienttrace logs helped me figure out which RS nodes were making
block requests. I hacked up a script to report which blocks are being
requested and how many times per minute. I found that some blocks were
being requested 10+ times in a minute and over 2000 times in an hour from
the same regionserver. This was causing the server to do 40+MB/s on reads
alone. That was on the higher side, the average was closer to 100 or so per
hour.

Now why did I end up in such a situation. It happened due to the fact that
I added servers to the cluster and rebalanced the cluster. At the same time
I added some drives and also removed the offending server in my setup. This
caused some of the data to not be co-located with the regionservers. Given
that major_compaction was disabled and it would not have run for a while
(atleast on some tables) these block requests would not go away. One of my
regionservers was totally overwhelmed. I made the situation worse when I
removed the server that was under heavy load with the assumption that it's
a hardware problem with the box without doing a deep dive (doh!). Given
that regionservers will be added in the future, I expect block locality to
go down till major_compaction runs. Also nodes can go down and cause this
problem. So I started thinking of probable solutions, but first some
observations.

*Observations/Comments*
- The surprising part was the regionservers were trying to make so many
requests for the same block in the same minute (let alone hour). Could this
happen because the original request took a few seconds and so the
regionserver re-requested ? I didn't see any fetch errors in the
regionserver logs for blocks.
- Even more strange; my heap size was at 11G and the time when this was
happening, the used heap was at 2-4G. I would have expected the heap to
grow higher than that since the blockCache should be using atleast 40% of
the available heap space.
- Another strange thing that I observed was, the block was being requested
from the same datanode every single time.

*Possible Solution/Changes*
- Would it make sense to give remote blocks higher priority over the local
blocks that can be read via SCR and not let them get evicted if there is a
tie in which block to evict ?
- Should we throttle the number of outgoing requests for a block ? I am not
sure if my firewall caused some issue but I wouldn't expect multiple block
fetch requests in the same minute. I did see a few RST packets getting
dropped at the firewall but I wasn't able to trace the problem was due to
this.
- We have 3 replicas available, shouldn't we request from the other
datanode if one might take a lot of time ? The amount of time it took to
read a block went up when the box was under heavy load, yet the re-requests
were going to the same one. Is this something that is available on the
DFSClient and can we exploit it ?
- Is it possible to migrate a region to a server which has higher number of
blocks available for it ? We don't need to make this automatic, but we
could provide a command that could be invoked manually to assign a region
to a specific regionserver. Thoughts ?

Thanks,
Viral

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or 

GC recommendations for large Region Server heaps

2013-07-08 Thread Suraj Varma
Hello:
We have an HBase cluster with region servers running on 8GB heap size with
a 0.6 block cache (it is a read heavy cluster, with bursty write traffic
via MR jobs). (version: hbase-0.94.6.1)

During HBaseCon, while speaking to a few attendees, I heard some folks were
running region servers as high as 24GB and some others in the 16GB range.

So - question: Are there any special GC recommendations (tuning parameters,
flags, etc) that folks who run at these large heaps can recommend while
moving up from an 8GB heap? i.e. for 16GB and for 24GB RS heaps ... ?

I'm especially concerned about long pauses causing zk session timeouts and
consequent RS shutdowns. Our boxes do have a lot of RAM and we are
exploring how we can use more of it for the cluster while maintaining
overall stability.

Also - if there are clusters running multiple region servers per host, I'd
be very interested to know what RS heap sizes those are being run at ...
and whether this was chosen as an alternative to running a single RS with
large heap.

(I know I'll have to test the GC stuff out on my cluster and for my
workloads anyway ... but just trying to get a feel of what sort of tuning
options had to be used to have a stable HBase cluster with 16 or 24GB RS
heaps).

Thanks in advance,
--Suraj


Re: GC recommendations for large Region Server heaps

2013-07-08 Thread Kevin O'dell
Hey Suraj,

  I would recommend turning on MSlab and using the following settings:

-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled



On Mon, Jul 8, 2013 at 2:09 PM, Suraj Varma svarma...@gmail.com wrote:

 Hello:
 We have an HBase cluster with region servers running on 8GB heap size with
 a 0.6 block cache (it is a read heavy cluster, with bursty write traffic
 via MR jobs). (version: hbase-0.94.6.1)

 During HBaseCon, while speaking to a few attendees, I heard some folks were
 running region servers as high as 24GB and some others in the 16GB range.

 So - question: Are there any special GC recommendations (tuning parameters,
 flags, etc) that folks who run at these large heaps can recommend while
 moving up from an 8GB heap? i.e. for 16GB and for 24GB RS heaps ... ?

 I'm especially concerned about long pauses causing zk session timeouts and
 consequent RS shutdowns. Our boxes do have a lot of RAM and we are
 exploring how we can use more of it for the cluster while maintaining
 overall stability.

 Also - if there are clusters running multiple region servers per host, I'd
 be very interested to know what RS heap sizes those are being run at ...
 and whether this was chosen as an alternative to running a single RS with
 large heap.

 (I know I'll have to test the GC stuff out on my cluster and for my
 workloads anyway ... but just trying to get a feel of what sort of tuning
 options had to be used to have a stable HBase cluster with 16 or 24GB RS
 heaps).

 Thanks in advance,
 --Suraj




-- 
Kevin O'Dell
Systems Engineer, Cloudera


RE: GC recommendations for large Region Server heaps

2013-07-08 Thread Vladimir Rodionov
I'm especially concerned about long pauses causing zk session timeouts and
consequent RS shutdowns. Our boxes do have a lot of RAM and we are
exploring how we can use more of it for the cluster while maintaining
overall stability.

Suraj, what is the maximum GC pause you have observed in your cluster so far?
I do not think s-t-w GC pause on 8GB heap will lasts longer than 5-8 secs.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Suraj Varma [svarma...@gmail.com]
Sent: Monday, July 08, 2013 11:09 AM
To: user@hbase.apache.org
Subject: GC recommendations for large Region Server heaps

Hello:
We have an HBase cluster with region servers running on 8GB heap size with
a 0.6 block cache (it is a read heavy cluster, with bursty write traffic
via MR jobs). (version: hbase-0.94.6.1)

During HBaseCon, while speaking to a few attendees, I heard some folks were
running region servers as high as 24GB and some others in the 16GB range.

So - question: Are there any special GC recommendations (tuning parameters,
flags, etc) that folks who run at these large heaps can recommend while
moving up from an 8GB heap? i.e. for 16GB and for 24GB RS heaps ... ?

I'm especially concerned about long pauses causing zk session timeouts and
consequent RS shutdowns. Our boxes do have a lot of RAM and we are
exploring how we can use more of it for the cluster while maintaining
overall stability.

Also - if there are clusters running multiple region servers per host, I'd
be very interested to know what RS heap sizes those are being run at ...
and whether this was chosen as an alternative to running a single RS with
large heap.

(I know I'll have to test the GC stuff out on my cluster and for my
workloads anyway ... but just trying to get a feel of what sort of tuning
options had to be used to have a stable HBase cluster with 16 or 24GB RS
heaps).

Thanks in advance,
--Suraj

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
Thanks guys for going through that never-ending email! I will create the
JIRA for block cache eviction and the regionserver assignment command. Ted
already pointed to the JIRA which tries to go a different datanode if the
primary is busy (I will add comments to that one).

To answer Andrews' questions:

- I am using HBase 0.94.4
- I tried taking a stack trace using jstack but after the dump it crashed
the regionserver. I also did not take the dump on the offending
regionserver, rather took it on the regionservers that were making the
block count. I will take a stack trace on the offending server. Is there
any other tool besides jstack ? I don't want to crash my regionserver.
- The HBase clients workload is fairly random and I write to a table every
4-5 seconds. I have varying workloads for different tables. But I do a lot
of batching on the client side and group similar rowkeys together before
doing a GET/PUT. For example: best case I end up doing ~100 puts every
second to a region or in the worst case it's ~5K puts every second. But
again since the workload is fairly random. Currently the clients for the
table which had the most amount of data has been disabled and yet I see the
heavy loads.

To answer Vladimir's points:
- Data access pattern definitely turns out to be uniform over a period of
time.
- I just did a sweep of my code base and found that there are a few places
where Scanner are using block cache. I will disable that and see how it
goes.

Thanks,
Viral


Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
I was able to reproduce the same regionserver asking for the same local
block over 300 times within the same 2 minute window by running one of my
heavy workloads.

Let me try and gather some stack dumps. I agree that jstack crashing the
jvm is concerning but there is nothing in the errors to know why it
happened. I will keep that conversation out of here.

As an addendum, I am using asynchbase as my client. Not sure if the arrival
of multiple requests for rowkeys that could be in the same non-cached block
causes hbase to queue up a non-cached block read via SCR and since the box
is under load, it queues up multiple of these and makes the problem worse.

Thanks,
Viral

On Mon, Jul 8, 2013 at 3:53 PM, Andrew Purtell apurt...@apache.org wrote:

 but unless the behavior you see is the _same_ regionserver asking for the
 _same_ block many times consecutively, it's probably workload related.



Re: optimizing block cache requests + eviction

2013-07-08 Thread Jean-Daniel Cryans
Do you know if it's a data or meta block?

J-D

On Mon, Jul 8, 2013 at 4:28 PM, Viral Bajaria viral.baja...@gmail.com wrote:
 I was able to reproduce the same regionserver asking for the same local
 block over 300 times within the same 2 minute window by running one of my
 heavy workloads.

 Let me try and gather some stack dumps. I agree that jstack crashing the
 jvm is concerning but there is nothing in the errors to know why it
 happened. I will keep that conversation out of here.

 As an addendum, I am using asynchbase as my client. Not sure if the arrival
 of multiple requests for rowkeys that could be in the same non-cached block
 causes hbase to queue up a non-cached block read via SCR and since the box
 is under load, it queues up multiple of these and makes the problem worse.

 Thanks,
 Viral

 On Mon, Jul 8, 2013 at 3:53 PM, Andrew Purtell apurt...@apache.org wrote:

 but unless the behavior you see is the _same_ regionserver asking for the
 _same_ block many times consecutively, it's probably workload related.



Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
Good question. When I looked at the logs, it's not clear from it whether
it's reading a meta or data block. Is there any kind of log line that
indicates that ? Given that it's saying that it's ready from a startOffset
I would assume this is a data block.

A question that comes to mind, is this read doing a seek to that position
directly or is it going to cache the block ? Looks like it is not caching
the block if it's reading directly from a given offset. Or am I wrong ?

Following is a sample line that I used while debugging:
2013-07-08 22:58:55,221 DEBUG org.apache.hadoop.hdfs.DFSClient: New
BlockReaderLocal for file
/mnt/data/current/subdir34/subdir26/blk_-448970697931783518 of size
67108864 startOffset 13006577 length 54102287 short circuit checksum true

On Mon, Jul 8, 2013 at 4:37 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Do you know if it's a data or meta block?


Re: optimizing block cache requests + eviction

2013-07-08 Thread Varun Sharma
FYI, if u disable your block cache - you will ask for Index blocks for
every single request. So such a high rate of request is plausible for Index
blocks even when your requests are totally random on your data.

Varun


On Mon, Jul 8, 2013 at 4:45 PM, Viral Bajaria viral.baja...@gmail.comwrote:

 Good question. When I looked at the logs, it's not clear from it whether
 it's reading a meta or data block. Is there any kind of log line that
 indicates that ? Given that it's saying that it's ready from a startOffset
 I would assume this is a data block.

 A question that comes to mind, is this read doing a seek to that position
 directly or is it going to cache the block ? Looks like it is not caching
 the block if it's reading directly from a given offset. Or am I wrong ?

 Following is a sample line that I used while debugging:
 2013-07-08 22:58:55,221 DEBUG org.apache.hadoop.hdfs.DFSClient: New
 BlockReaderLocal for file
 /mnt/data/current/subdir34/subdir26/blk_-448970697931783518 of size
 67108864 startOffset 13006577 length 54102287 short circuit checksum true

 On Mon, Jul 8, 2013 at 4:37 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:

  Do you know if it's a data or meta block?



Re: optimizing block cache requests + eviction

2013-07-08 Thread Jean-Daniel Cryans
meta blocks are at the end:
http://hbase.apache.org/book.html#d2617e12979, a way to tell would be
by logging from the HBase side but then I guess it's hard to reconcile
with which file we're actually reading from...

Regarding your second question, you are asking if we block HDFS
blocks? We don't, since we don't even know about HDFS blocks. The
BlockReader seeks into the file and return whatever data is asked.

J-D

On Mon, Jul 8, 2013 at 4:45 PM, Viral Bajaria viral.baja...@gmail.com wrote:
 Good question. When I looked at the logs, it's not clear from it whether
 it's reading a meta or data block. Is there any kind of log line that
 indicates that ? Given that it's saying that it's ready from a startOffset
 I would assume this is a data block.

 A question that comes to mind, is this read doing a seek to that position
 directly or is it going to cache the block ? Looks like it is not caching
 the block if it's reading directly from a given offset. Or am I wrong ?

 Following is a sample line that I used while debugging:
 2013-07-08 22:58:55,221 DEBUG org.apache.hadoop.hdfs.DFSClient: New
 BlockReaderLocal for file
 /mnt/data/current/subdir34/subdir26/blk_-448970697931783518 of size
 67108864 startOffset 13006577 length 54102287 short circuit checksum true

 On Mon, Jul 8, 2013 at 4:37 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Do you know if it's a data or meta block?


Re: optimizing block cache requests + eviction

2013-07-08 Thread Viral Bajaria
We haven't disable block cache. So I doubt that's the problem.

On Mon, Jul 8, 2013 at 4:50 PM, Varun Sharma va...@pinterest.com wrote:

 FYI, if u disable your block cache - you will ask for Index blocks for
 every single request. So such a high rate of request is plausible for Index
 blocks even when your requests are totally random on your data.

 Varun



Re: Using separator/delimiter in HBase rowkey?

2013-07-08 Thread Jason Huang
thanks for all these valuable comments.

Jason

On Mon, Jul 8, 2013 at 12:25 PM, Michael Segel michael_se...@hotmail.comwrote:

 Where is murmur?

 In your app?
 So then every app that wants to fetch that row must now use murmur.

 Added to Hadoop/HBase?
 Then when you do upgrades you have to make sure that the package is still
 in your class path. Note that different vendor's release management will
 mean YMMV as to what happens to your class paths and set up or if the jar
 gets blown out of the directory.


 Is the added cost in maintenance worth it?

 I don't know but I seriously doubt it.


 On Jul 8, 2013, at 11:00 AM, Mike Axiak m...@axiak.net wrote:

  I just don't understand.. every creation/interpretation of the key is
  going to require code to be used. The murmur implementation is with
  that code. How is there any extra burden?
 
  On Mon, Jul 8, 2013 at 11:54 AM, Michael Segel
  michael_se...@hotmail.com wrote:
  You will need to put the jar into either every app that runs, or you
 will need to put it on every node. Every upgrade, you will need to make
 sure its still in your class path.
 
  More work for the admins. So how much faster is it over MD5? MD5 and
 SHA-1 are part of the Java libraries that ship w Sun/Oracle so you have
 them already installed and in your class path.
 
  Just saying... ;-)
 
  On Jul 8, 2013, at 10:36 AM, Mike Axiak m...@axiak.net wrote:
 
  On Mon, Jul 8, 2013 at 11:29 AM, Michael Segel
  michael_se...@hotmail.com wrote:
  If not, you end up having to do a bit more maintenance of your
 cluster and that's going to be part of your tradeoff.
 
  How so?
 
  -Mike
 
 
 




Re: GC recommendations for large Region Server heaps

2013-07-08 Thread Otis Gospodnetic
Hi,

Check http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/

Those graphs show RegionServer before and after switch to G1.  The
dashboard screenshot further below shows CMS (top row) vs. G1 (bottom
row).  After those tests we ended up switching to G1 across the whole
cluster and haven't had issues or major pauses since knock on
keyboard.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Jul 8, 2013 at 2:56 PM, Stack st...@duboce.net wrote:
 On Mon, Jul 8, 2013 at 11:09 AM, Suraj Varma svarma...@gmail.com wrote:

 Hello:
 We have an HBase cluster with region servers running on 8GB heap size with
 a 0.6 block cache (it is a read heavy cluster, with bursty write traffic
 via MR jobs). (version: hbase-0.94.6.1)

 During HBaseCon, while speaking to a few attendees, I heard some folks were
 running region servers as high as 24GB and some others in the 16GB range.

 So - question: Are there any special GC recommendations (tuning parameters,
 flags, etc) that folks who run at these large heaps can recommend while
 moving up from an 8GB heap? i.e. for 16GB and for 24GB RS heaps ... ?

 I'm especially concerned about long pauses causing zk session timeouts and
 consequent RS shutdowns. Our boxes do have a lot of RAM and we are
 exploring how we can use more of it for the cluster while maintaining
 overall stability.

 Also - if there are clusters running multiple region servers per host, I'd
 be very interested to know what RS heap sizes those are being run at ...
 and whether this was chosen as an alternative to running a single RS with
 large heap.

 (I know I'll have to test the GC stuff out on my cluster and for my
 workloads anyway ... but just trying to get a feel of what sort of tuning
 options had to be used to have a stable HBase cluster with 16 or 24GB RS
 heaps).



 You hit full GC in this 8G heap Suraj?  Can you try running one server at
 24G to see how it does (with GC logging enabled so you can watch it over
 time)?  On one hand, more heap may make it so you avoid full GC -- if you
 are hitting them now at 8G -- because application has more head room.  On
 other hand, yes, if a full GC hits, it will be gone for proportionally
 longer than for your 8G heap.

 St.Ack


Re: GC recommendations for large Region Server heaps

2013-07-08 Thread Azuryy Yu
This is my HBASE GC options of CMS, it does work well.

XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:PermSize=160m
-XX:MaxPermSize=160m -XX:GCTimeRatio=19 -XX:SoftRefLRUPolicyMSPerMB=0
-XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=1 -XX:+UseFastAccessorMethods
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled
-XX:CMSMaxAbortablePrecleanTime=300 -XX:+CMSScavengeBeforeRemark



On Tue, Jul 9, 2013 at 1:12 PM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

 Hi,

 Check http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/

 Those graphs show RegionServer before and after switch to G1.  The
 dashboard screenshot further below shows CMS (top row) vs. G1 (bottom
 row).  After those tests we ended up switching to G1 across the whole
 cluster and haven't had issues or major pauses since knock on
 keyboard.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Mon, Jul 8, 2013 at 2:56 PM, Stack st...@duboce.net wrote:
  On Mon, Jul 8, 2013 at 11:09 AM, Suraj Varma svarma...@gmail.com
 wrote:
 
  Hello:
  We have an HBase cluster with region servers running on 8GB heap size
 with
  a 0.6 block cache (it is a read heavy cluster, with bursty write traffic
  via MR jobs). (version: hbase-0.94.6.1)
 
  During HBaseCon, while speaking to a few attendees, I heard some folks
 were
  running region servers as high as 24GB and some others in the 16GB
 range.
 
  So - question: Are there any special GC recommendations (tuning
 parameters,
  flags, etc) that folks who run at these large heaps can recommend while
  moving up from an 8GB heap? i.e. for 16GB and for 24GB RS heaps ... ?
 
  I'm especially concerned about long pauses causing zk session timeouts
 and
  consequent RS shutdowns. Our boxes do have a lot of RAM and we are
  exploring how we can use more of it for the cluster while maintaining
  overall stability.
 
  Also - if there are clusters running multiple region servers per host,
 I'd
  be very interested to know what RS heap sizes those are being run at ...
  and whether this was chosen as an alternative to running a single RS
 with
  large heap.
 
  (I know I'll have to test the GC stuff out on my cluster and for my
  workloads anyway ... but just trying to get a feel of what sort of
 tuning
  options had to be used to have a stable HBase cluster with 16 or 24GB RS
  heaps).
 
 
 
  You hit full GC in this 8G heap Suraj?  Can you try running one server at
  24G to see how it does (with GC logging enabled so you can watch it over
  time)?  On one hand, more heap may make it so you avoid full GC -- if you
  are hitting them now at 8G -- because application has more head room.  On
  other hand, yes, if a full GC hits, it will be gone for proportionally
  longer than for your 8G heap.
 
  St.Ack