subject:"答复\: HBase random read performance"

RE: 答复: HBase random read performance

2013-04-16 Thread Liu, Raymond

So what is lacking here? The action should also been parallel inside RS for 
each region, Instead of just parallel on RS level?
Seems this will be rather difficult to implement, and for Get, might not be 
worthy?

 
 I looked
 at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
 in
 0.94
 
 In processBatchCallback(), starting line 1538,
 
 // step 1: break up into regionserver-sized chunks and build the data
 structs
 MapHRegionLocation, MultiActionR actionsByServer =
   new HashMapHRegionLocation, MultiActionR();
 for (int i = 0; i  workingList.size(); i++) {
 
 So we do group individual action by server.
 
 FYI
 
 On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote:
 
  Doug made a good point.
 
  Take a look at the performance gain for parallel scan (bottom chart
  compared to top chart):
  https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
 
  See
 
 https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362
 8300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
 el#comment-13628300for explanation of the two methods.
 
  Cheers
 
  On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
 doug.m...@explorysmedical.comwrote:
 
 
  Hi there, regarding this...
 
   We are passing random 1 row-keys as input, while HBase is
   taking
  around
   17 secs to return 1 records.
 
 
  ….  Given that you are generating 10,000 random keys, your multi-get
  is very likely hitting all 5 nodes of your cluster.
 
 
  Historically, multi-Get used to first sort the requests by RS and
  then
  *serially* go the RS to process the multi-Get.  I'm not sure of the
  current (0.94.x) behavior if it multi-threads or not.
 
  One thing you might want to consider is confirming that client
  behavior, and if it's not multi-threading then perform a test that
  does the same RS sorting via...
 
 
  http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable
  .html#
  getRegionLocation%28byte[http://hbase.apache.org/apidocs/org/apache/
  hadoop/hbase/client/HTable.html#getRegionLocation%28byte[
  ]%29
 
  …. and then spin up your own threads (one per target RS) and see what
  happens.
 
 
 
  On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:
 
  Hi Liang,
  
  Thanks Liang for reply..
  
  Ans1:
  I tried by using HFile block size of 32 KB and bloom filter is enabled.
  The
  random read performance is 1 records in 23 secs.
  
  Ans2:
  We are retrieving all the 1 rows in one call.
  
  Ans3:
  Disk detai:
  Model Number:   ST2000DM001-1CH164
  Serial Number:  Z1E276YF
  
  Please suggest some more optimization
  
  Thanks,
  Ankit Jain
  
  On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:
  
   First, it's probably helpless to set block size to 4KB, please
   refer to the beginning of HFile.java:
  
Smaller blocks are good
* for random access, but require more memory to hold the block
  index, and  may
* be slower to create (because we must flush the compressor
  stream at the
* conclusion of each data block, which leads to an FS I/O flush).
   Further, due
* to the internal caching in Compression codec, the smallest
  possible  block
* size would be around 20KB-30KB.
  
   Second, is it a single-thread test client or multi-threads? we
   couldn't expect too much if the requests are one by one.
  
   Third, could you provide more info about  your DN disk numbers and
   IO utils ?
  
   Thanks,
   Liang
   
   发件人: Ankit Jain [ankitjainc...@gmail.com]
   发送时间: 2013年4月15日 18:53
   收件人: user@hbase.apache.org
   主题: Re: HBase random read performance
  
   Hi Anoop,
  
   Thanks for reply..
  
   I tried by setting Hfile block size 4KB and also enabled the bloom
   filter(ROW). The maximum read performance that I was able to
   achieve is
   1 records in 14 secs (size of record is 1.6KB).
  
   Please suggest some tuning..
  
   Thanks,
   Ankit Jain
  
  
  
   On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal 
   rishabh.agra...@impetus.co.in wrote:
  
Interesting. Can you explain why this happens?
   
-Original Message-
From: Anoop Sam John [mailto:anoo...@huawei.com]
Sent: Monday, April 15, 2013 3:47 PM
To: user@hbase.apache.org
Subject: RE: HBase random read performance
   
Ankit
 I guess you might be having default HFile block
size which is 64KB.
For random gets a lower value will be better. Try will some
thing
  like
   8KB
and check the latency?
   
Ya ofcourse blooms can help (if major compaction was not done at
the
  time
of testing)
   
-Anoop-

From: Ankit Jain [ankitjainc...@gmail.com]
Sent: Saturday, April 13, 2013 11:01 AM
To: user@hbase.apache.org
Subject: HBase random read performance
   
Hi All,
   
We are using HBase 0.94.5 and Hadoop

Re: 答复: HBase random read performance

2013-04-16 Thread Nicolas Liochon

I think there is something in the middle that could be done. It was
discussed here a while ago, but without any JIRA created. See thread:
http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E

If someone can spend some time on it, I can create the JIRA...

Nicolas

On Tue, Apr 16, 2013 at 9:49 AM, Liu, Raymond raymond@intel.com wrote:

So what is lacking here? The action should also been parallel inside RS
for each region, Instead of just parallel on RS level?
Seems this will be rather difficult to implement, and for Get, might not
be worthy?

I looked
at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
in
0.94

In processBatchCallback(), starting line 1538,

// step 1: break up into regionserver-sized chunks and build the
data
structs
MapHRegionLocation, MultiActionR actionsByServer =
new HashMapHRegionLocation, MultiActionR();
for (int i = 0; i workingList.size(); i++) {

So we do group individual action by server.

FYI

On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote:

Doug made a good point.

Take a look at the performance gain for parallel scan (bottom chart
compared to top chart):
https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png

See

https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362
8300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
el#comment-13628300for explanation of the two methods.

Cheers

On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
doug.m...@explorysmedical.comwrote:

Hi there, regarding this...

We are passing random 1 row-keys as input, while HBase is
taking
around
17 secs to return 1 records.

…. Given that you are generating 10,000 random keys, your multi-get
is very likely hitting all 5 nodes of your cluster.

Historically, multi-Get used to first sort the requests by RS and
then
*serially* go the RS to process the multi-Get. I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client
behavior, and if it's not multi-threading then perform a test that
does the same RS sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable
.html#
getRegionLocation%28byte[http://hbase.apache.org/apidocs/org/apache/
hadoop/hbase/client/HTable.html#getRegionLocation%28byte[
]%29

…. and then spin up your own threads (one per target RS) and see what
happens.

On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is
enabled.
The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number: ST2000DM001-1CH164
Serial Number: Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

First, it's probably helpless to set block size to 4KB, please
refer to the beginning of HFile.java:

Smaller blocks are good
* for random access, but require more memory to hold the block
index, and may
* be slower to create (because we must flush the compressor
stream at the
* conclusion of each data block, which leads to an FS I/O flush).
Further, due
* to the internal caching in Compression codec, the smallest
possible block
* size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we
couldn't expect too much if the requests are one by one.

Third, could you provide more info about your DN disk numbers and
IO utils ?

Thanks,
Liang

发件人: Ankit Jain [ankitjainc...@gmail.com]
发送时间: 2013年4月15日 18:53
收件人: user@hbase.apache.org
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the bloom
filter(ROW). The maximum read performance that I was able to
achieve is
1 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal
rishabh.agra...@impetus.co.in wrote:

Interesting. Can you explain why this happens?

-Original Message-
From: Anoop Sam John [mailto:anoo...@huawei.com]
Sent: Monday, April 15, 2013 3:47 PM
To: user@hbase.apache.org
Subject: RE: HBase random read performance

Ankit
I

Re: 答复: HBase random read performance

2013-04-16 Thread Jean-Marc Spaggiari

Hi Nicolas,

I think it might be good to create a JIRA for that anyway since seems that
some users are expecting this behaviour.

My 2¢ ;)

2013/4/16 Nicolas Liochon nkey...@gmail.com

I think there is something in the middle that could be done. It was
discussed here a while ago, but without any JIRA created. See thread:

http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E

If someone can spend some time on it, I can create the JIRA...

Nicolas

On Tue, Apr 16, 2013 at 9:49 AM, Liu, Raymond raymond@intel.com
wrote:

I looked
at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
in
0.94

In processBatchCallback(), starting line 1538,

// step 1: break up into regionserver-sized chunks and build
the
data
structs
MapHRegionLocation, MultiActionR actionsByServer =
new HashMapHRegionLocation, MultiActionR();
for (int i = 0; i workingList.size(); i++) {

So we do group individual action by server.

FYI

On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote:

Doug made a good point.

Take a look at the performance gain for parallel scan (bottom chart
compared to top chart):

https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png

See

https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=1362

8300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
el#comment-13628300for explanation of the two methods.

Cheers

On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
doug.m...@explorysmedical.comwrote:

Hi there, regarding this...

We are passing random 1 row-keys as input, while HBase is
taking
around
17 secs to return 1 records.

…. Given that you are generating 10,000 random keys, your multi-get
is very likely hitting all 5 nodes of your cluster.

Historically, multi-Get used to first sort the requests by RS and
then
*serially* go the RS to process the multi-Get. I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client
behavior, and if it's not multi-threading then perform a test that
does the same RS sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable
.html#
getRegionLocation%28byte[
http://hbase.apache.org/apidocs/org/apache/
hadoop/hbase/client/HTable.html#getRegionLocation%28byte[
]%29

…. and then spin up your own threads (one per target RS) and see
what
happens.

On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is
enabled.
The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number: ST2000DM001-1CH164
Serial Number: Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

First, it's probably helpless to set block size to 4KB, please
refer to the beginning of HFile.java:

Smaller blocks are good
* for random access, but require more memory to hold the block
index, and may
* be slower to create (because we must flush the compressor
stream at the
* conclusion of each data block, which leads to an FS I/O
flush).
Further, due
* to the internal caching in Compression codec, the smallest
possible block
* size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we
couldn't expect too much if the requests are one by one.

Third, could you provide more info about your DN disk numbers
and
IO utils ?

Thanks,
Liang

发件人: Ankit Jain [ankitjainc...@gmail.com]
发送时间: 2013年4月15日 18:53
收件人: user@hbase.apache.org
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the
bloom
filter(ROW). The maximum read performance that I was able to
achieve is
1 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh

Re: 答复: HBase random read performance

2013-04-16 Thread lars hofhansl

This fundamentally different, though. A scanner by default scans all regions 
serially, because it promises to return all rows in sort order.
A multi get is already parallelized across regions (and hence accross region 
servers).


Before we do a lot of work here we should fist make sure that nothing else is 
wrong with OPs setup.
17s for 1 is not right.


Ankit, what does the IO look like across the machines in the cluster while this 
is happening?

Since you pick 1 rows at random your expectation is that entire set of rows 
will fit into the block cache? Is that the case?

-- Lars




 From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org 
Sent: Monday, April 15, 2013 10:03 AM
Subject: Re: 答复: HBase random read performance
 

This is a related JIRA which should provide noticeable speed up:

HBASE-1935 Scan in parallel

Cheers

On Mon, Apr 15, 2013 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

 I looked
 at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in
 0.94

 In processBatchCallback(), starting line 1538,

         // step 1: break up into regionserver-sized chunks and build the
 data structs
         MapHRegionLocation, MultiActionR actionsByServer =
           new HashMapHRegionLocation, MultiActionR();
         for (int i = 0; i  workingList.size(); i++) {

 So we do group individual action by server.

 FYI

 On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote:

 Doug made a good point.

 Take a look at the performance gain for parallel scan (bottom chart
 compared to top chart):
 https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png

 See
 https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
  explanation of the two methods.

 Cheers

 On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil doug.m...@explorysmedical.com
  wrote:


 Hi there, regarding this...

  We are passing random 1 row-keys as input, while HBase is taking
 around
  17 secs to return 1 records.


 ….  Given that you are generating 10,000 random keys, your multi-get is
 very likely hitting all 5 nodes of your cluster.


 Historically, multi-Get used to first sort the requests by RS and then
 *serially* go the RS to process the multi-Get.  I'm not sure of the
 current (0.94.x) behavior if it multi-threads or not.

 One thing you might want to consider is confirming that client behavior,
 and if it's not multi-threading then perform a test that does the same RS
 sorting via...


 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
 getRegionLocation%28byte[http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[
 ]%29

 …. and then spin up your own threads (one per target RS) and see what
 happens.



 On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

 Hi Liang,
 
 Thanks Liang for reply..
 
 Ans1:
 I tried by using HFile block size of 32 KB and bloom filter is enabled.
 The
 random read performance is 1 records in 23 secs.
 
 Ans2:
 We are retrieving all the 1 rows in one call.
 
 Ans3:
 Disk detai:
 Model Number:       ST2000DM001-1CH164
 Serial Number:      Z1E276YF
 
 Please suggest some more optimization
 
 Thanks,
 Ankit Jain
 
 On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:
 
  First, it's probably helpless to set block size to 4KB, please refer
 to
  the beginning of HFile.java:
 
   Smaller blocks are good
   * for random access, but require more memory to hold the block index,
 and
  may
   * be slower to create (because we must flush the compressor stream at
 the
   * conclusion of each data block, which leads to an FS I/O flush).
  Further, due
   * to the internal caching in Compression codec, the smallest possible
  block
   * size would be around 20KB-30KB.
 
  Second, is it a single-thread test client or multi-threads? we
 couldn't
  expect too much if the requests are one by one.
 
  Third, could you provide more info about  your DN disk numbers and IO
  utils ?
 
  Thanks,
  Liang
  
  发件人: Ankit Jain [ankitjainc...@gmail.com]
  发送时间: 2013年4月15日 18:53
  收件人: user@hbase.apache.org
  主题: Re: HBase random read performance
 
  Hi Anoop,
 
  Thanks for reply..
 
  I tried by setting Hfile block size 4KB and also enabled the bloom
  filter(ROW). The maximum read performance that I was able to achieve
 is
  1 records in 14 secs (size of record is 1.6KB).
 
  Please suggest some tuning..
 
  Thanks,
  Ankit Jain
 
 
 
  On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal 
  rishabh.agra...@impetus.co.in wrote:
 
   Interesting. Can you explain why this happens?
  
   -Original Message-
   From: Anoop Sam John [mailto:anoo...@huawei.com]
   Sent: Monday, April 15, 2013 3:47 PM
   To: user@hbase.apache.org
   Subject: RE: HBase random read performance
  
   Ankit

Re: 答复: HBase random read performance

2013-04-15 Thread Ankit Jain

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is enabled. The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number:   ST2000DM001-1CH164
Serial Number:  Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

 First, it's probably helpless to set block size to 4KB, please refer to
 the beginning of HFile.java:

  Smaller blocks are good
  * for random access, but require more memory to hold the block index, and
 may
  * be slower to create (because we must flush the compressor stream at the
  * conclusion of each data block, which leads to an FS I/O flush).
 Further, due
  * to the internal caching in Compression codec, the smallest possible
 block
  * size would be around 20KB-30KB.

 Second, is it a single-thread test client or multi-threads? we couldn't
 expect too much if the requests are one by one.

 Third, could you provide more info about  your DN disk numbers and IO
 utils ?

 Thanks,
 Liang
 
 发件人: Ankit Jain [ankitjainc...@gmail.com]
 发送时间: 2013年4月15日 18:53
 收件人: user@hbase.apache.org
 主题: Re: HBase random read performance

 Hi Anoop,

 Thanks for reply..

 I tried by setting Hfile block size 4KB and also enabled the bloom
 filter(ROW). The maximum read performance that I was able to achieve is
 1 records in 14 secs (size of record is 1.6KB).

 Please suggest some tuning..

 Thanks,
 Ankit Jain



 On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal 
 rishabh.agra...@impetus.co.in wrote:

  Interesting. Can you explain why this happens?
 
  -Original Message-
  From: Anoop Sam John [mailto:anoo...@huawei.com]
  Sent: Monday, April 15, 2013 3:47 PM
  To: user@hbase.apache.org
  Subject: RE: HBase random read performance
 
  Ankit
   I guess you might be having default HFile block size
  which is 64KB.
  For random gets a lower value will be better. Try will some thing like
 8KB
  and check the latency?
 
  Ya ofcourse blooms can help (if major compaction was not done at the time
  of testing)
 
  -Anoop-
  
  From: Ankit Jain [ankitjainc...@gmail.com]
  Sent: Saturday, April 13, 2013 11:01 AM
  To: user@hbase.apache.org
  Subject: HBase random read performance
 
  Hi All,
 
  We are using HBase 0.94.5 and Hadoop 1.0.4.
 
  We have HBase cluster of 5 nodes(5 regionservers and 1 master node). Each
  regionserver has 8 GB RAM.
 
  We have loaded 25 millions records in HBase table, regions are pre-split
  into 16 regions and all the regions are equally loaded.
 
  We are getting very low random read performance while performing multi
 get
  from HBase.
 
  We are passing random 1 row-keys as input, while HBase is taking
 around
  17 secs to return 1 records.
 
  Please suggest some tuning to increase HBase read performance.
 
  Thanks,
  Ankit Jain
  iLabs
 
 
 
  --
  Thanks,
  Ankit Jain
 
  
 
 
 
 
 
 
  NOTE: This message may contain information that is confidential,
  proprietary, privileged or otherwise protected by law. The message is
  intended solely for the named addressee. If received in error, please
  destroy and notify the sender. Any use of this email is prohibited when
  received in error. Impetus does not represent, warrant and/or guarantee,
  that the integrity of this communication has been maintained nor that the
  communication is free of errors, virus, interception or interference.
 



 --
 Thanks,
 Ankit Jain




-- 
Thanks,
Ankit Jain

Re: 答复: HBase random read performance

2013-04-15 Thread Doug Meil


Hi there, regarding this...

 We are passing random 1 row-keys as input, while HBase is taking
around
 17 secs to return 1 records.


….  Given that you are generating 10,000 random keys, your multi-get is
very likely hitting all 5 nodes of your cluster.


Historically, multi-Get used to first sort the requests by RS and then
*serially* go the RS to process the multi-Get.  I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client behavior,
and if it's not multi-threading then perform a test that does the same RS
sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
getRegionLocation%28byte[]%29

…. and then spin up your own threads (one per target RS) and see what
happens.



On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is enabled.
The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number:   ST2000DM001-1CH164
Serial Number:  Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

 First, it's probably helpless to set block size to 4KB, please refer to
 the beginning of HFile.java:

  Smaller blocks are good
  * for random access, but require more memory to hold the block index,
and
 may
  * be slower to create (because we must flush the compressor stream at
the
  * conclusion of each data block, which leads to an FS I/O flush).
 Further, due
  * to the internal caching in Compression codec, the smallest possible
 block
  * size would be around 20KB-30KB.

 Second, is it a single-thread test client or multi-threads? we couldn't
 expect too much if the requests are one by one.

 Third, could you provide more info about  your DN disk numbers and IO
 utils ?

 Thanks,
 Liang
 
 发件人: Ankit Jain [ankitjainc...@gmail.com]
 发送时间: 2013年4月15日 18:53
 收件人: user@hbase.apache.org
 主题: Re: HBase random read performance

 Hi Anoop,

 Thanks for reply..

 I tried by setting Hfile block size 4KB and also enabled the bloom
 filter(ROW). The maximum read performance that I was able to achieve is
 1 records in 14 secs (size of record is 1.6KB).

 Please suggest some tuning..

 Thanks,
 Ankit Jain



 On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal 
 rishabh.agra...@impetus.co.in wrote:

  Interesting. Can you explain why this happens?
 
  -Original Message-
  From: Anoop Sam John [mailto:anoo...@huawei.com]
  Sent: Monday, April 15, 2013 3:47 PM
  To: user@hbase.apache.org
  Subject: RE: HBase random read performance
 
  Ankit
   I guess you might be having default HFile block size
  which is 64KB.
  For random gets a lower value will be better. Try will some thing like
 8KB
  and check the latency?
 
  Ya ofcourse blooms can help (if major compaction was not done at the
time
  of testing)
 
  -Anoop-
  
  From: Ankit Jain [ankitjainc...@gmail.com]
  Sent: Saturday, April 13, 2013 11:01 AM
  To: user@hbase.apache.org
  Subject: HBase random read performance
 
  Hi All,
 
  We are using HBase 0.94.5 and Hadoop 1.0.4.
 
  We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
Each
  regionserver has 8 GB RAM.
 
  We have loaded 25 millions records in HBase table, regions are
pre-split
  into 16 regions and all the regions are equally loaded.
 
  We are getting very low random read performance while performing multi
 get
  from HBase.
 
  We are passing random 1 row-keys as input, while HBase is taking
 around
  17 secs to return 1 records.
 
  Please suggest some tuning to increase HBase read performance.
 
  Thanks,
  Ankit Jain
  iLabs
 
 
 
  --
  Thanks,
  Ankit Jain
 
  
 
 
 
 
 
 
  NOTE: This message may contain information that is confidential,
  proprietary, privileged or otherwise protected by law. The message is
  intended solely for the named addressee. If received in error, please
  destroy and notify the sender. Any use of this email is prohibited
when
  received in error. Impetus does not represent, warrant and/or
guarantee,
  that the integrity of this communication has been maintained nor that
the
  communication is free of errors, virus, interception or interference.
 



 --
 Thanks,
 Ankit Jain




-- 
Thanks,
Ankit Jain

Re: 答复: HBase random read performance

2013-04-15 Thread Ted Yu

Doug made a good point.

Take a look at the performance gain for parallel scan (bottom chart
compared to top chart):
https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png

See
https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
explanation of the two methods.

Cheers

On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil doug.m...@explorysmedical.comwrote:

Hi there, regarding this...

We are passing random 1 row-keys as input, while HBase is taking
around
17 secs to return 1 records.

…. Given that you are generating 10,000 random keys, your multi-get is
very likely hitting all 5 nodes of your cluster.

Historically, multi-Get used to first sort the requests by RS and then
*serially* go the RS to process the multi-Get. I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client behavior,
and if it's not multi-threading then perform a test that does the same RS
sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
getRegionLocation%28byte[]%29

…. and then spin up your own threads (one per target RS) and see what
happens.

On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is enabled.
The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number: ST2000DM001-1CH164
Serial Number: Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

First, it's probably helpless to set block size to 4KB, please refer to
the beginning of HFile.java:

Smaller blocks are good
* for random access, but require more memory to hold the block index,
and
may
* be slower to create (because we must flush the compressor stream at
the
* conclusion of each data block, which leads to an FS I/O flush).
Further, due
* to the internal caching in Compression codec, the smallest possible
block
* size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we couldn't
expect too much if the requests are one by one.

Third, could you provide more info about your DN disk numbers and IO
utils ?

Thanks,
Liang

发件人: Ankit Jain [ankitjainc...@gmail.com]
发送时间: 2013年4月15日 18:53
收件人: user@hbase.apache.org
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the bloom
filter(ROW). The maximum read performance that I was able to achieve is
1 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal
rishabh.agra...@impetus.co.in wrote:

Interesting. Can you explain why this happens?

-Original Message-
From: Anoop Sam John [mailto:anoo...@huawei.com]
Sent: Monday, April 15, 2013 3:47 PM
To: user@hbase.apache.org
Subject: RE: HBase random read performance

Ankit
I guess you might be having default HFile block size
which is 64KB.
For random gets a lower value will be better. Try will some thing like
8KB
and check the latency?

Ya ofcourse blooms can help (if major compaction was not done at the
time
of testing)

-Anoop-

From: Ankit Jain [ankitjainc...@gmail.com]
Sent: Saturday, April 13, 2013 11:01 AM
To: user@hbase.apache.org
Subject: HBase random read performance

Hi All,

We are using HBase 0.94.5 and Hadoop 1.0.4.

We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
Each
regionserver has 8 GB RAM.

We have loaded 25 millions records in HBase table, regions are
pre-split
into 16 regions and all the regions are equally loaded.

We are getting very low random read performance while performing multi
get
from HBase.

We are passing random 1 row-keys as input, while HBase is taking
around
17 secs to return 1 records.

Please suggest some tuning to increase HBase read performance.

Thanks,
Ankit Jain
iLabs

--
Thanks,
Ankit Jain

NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited
when
received in error. Impetus does not represent, warrant and/or

Re: 答复: HBase random read performance

2013-04-15 Thread Ted Yu

I looked
at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in
0.94

In processBatchCallback(), starting line 1538,

// step 1: break up into regionserver-sized chunks and build the
data structs
MapHRegionLocation, MultiActionR actionsByServer =
new HashMapHRegionLocation, MultiActionR();
for (int i = 0; i workingList.size(); i++) {

So we do group individual action by server.

FYI

On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote:

Doug made a good point.

Take a look at the performance gain for parallel scan (bottom chart
compared to top chart):
https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png

See
https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
explanation of the two methods.

Cheers

On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil
doug.m...@explorysmedical.comwrote:

Hi there, regarding this...

We are passing random 1 row-keys as input, while HBase is taking
around
17 secs to return 1 records.

…. Given that you are generating 10,000 random keys, your multi-get is
very likely hitting all 5 nodes of your cluster.

Historically, multi-Get used to first sort the requests by RS and then
*serially* go the RS to process the multi-Get. I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client behavior,
and if it's not multi-threading then perform a test that does the same RS
sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
getRegionLocation%28byte[http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[
]%29

…. and then spin up your own threads (one per target RS) and see what
happens.

On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is enabled.
The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number: ST2000DM001-1CH164
Serial Number: Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

First, it's probably helpless to set block size to 4KB, please refer to
the beginning of HFile.java:

Smaller blocks are good
* for random access, but require more memory to hold the block index,
and
may
* be slower to create (because we must flush the compressor stream at
the
* conclusion of each data block, which leads to an FS I/O flush).
Further, due
* to the internal caching in Compression codec, the smallest possible
block
* size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we couldn't
expect too much if the requests are one by one.

Third, could you provide more info about your DN disk numbers and IO
utils ?

Thanks,
Liang

发件人: Ankit Jain [ankitjainc...@gmail.com]
发送时间: 2013年4月15日 18:53
收件人: user@hbase.apache.org
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the bloom
filter(ROW). The maximum read performance that I was able to achieve is
1 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal
rishabh.agra...@impetus.co.in wrote:

Interesting. Can you explain why this happens?

-Original Message-
From: Anoop Sam John [mailto:anoo...@huawei.com]
Sent: Monday, April 15, 2013 3:47 PM
To: user@hbase.apache.org
Subject: RE: HBase random read performance

Ankit
I guess you might be having default HFile block size
which is 64KB.
For random gets a lower value will be better. Try will some thing
like
8KB
and check the latency?

Ya ofcourse blooms can help (if major compaction was not done at the
time
of testing)

-Anoop-

From: Ankit Jain [ankitjainc...@gmail.com]
Sent: Saturday, April 13, 2013 11:01 AM
To: user@hbase.apache.org
Subject: HBase random read performance

Hi All,

We are using HBase 0.94.5 and Hadoop 1.0.4.

We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
Each
regionserver has 8 GB RAM.

We have loaded 25 millions records in HBase table, regions are
pre-split
into 16 regions and all the regions are equally loaded.

We are getting very low random read performance while performing
multi
get
from HBase.

We are passing random 1 row-keys as

Re: 答复: HBase random read performance

2013-04-15 Thread Ted Yu

This is a related JIRA which should provide noticeable speed up:

HBASE-1935 Scan in parallel

Cheers

On Mon, Apr 15, 2013 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote:

I looked
at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in
0.94

In processBatchCallback(), starting line 1538,

// step 1: break up into regionserver-sized chunks and build the
data structs
MapHRegionLocation, MultiActionR actionsByServer =
new HashMapHRegionLocation, MultiActionR();
for (int i = 0; i workingList.size(); i++) {

So we do group individual action by server.

FYI

On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu yuzhih...@gmail.com wrote:

Doug made a good point.

Take a look at the performance gain for parallel scan (bottom chart
compared to top chart):
https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png

See
https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
explanation of the two methods.

Cheers

On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil doug.m...@explorysmedical.com
wrote:

Hi there, regarding this...

We are passing random 1 row-keys as input, while HBase is taking
around
17 secs to return 1 records.

…. Given that you are generating 10,000 random keys, your multi-get is
very likely hitting all 5 nodes of your cluster.

Historically, multi-Get used to first sort the requests by RS and then
*serially* go the RS to process the multi-Get. I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client behavior,
and if it's not multi-threading then perform a test that does the same RS
sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
getRegionLocation%28byte[http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[
]%29

…. and then spin up your own threads (one per target RS) and see what
happens.

On 4/15/13 9:04 AM, Ankit Jain ankitjainc...@gmail.com wrote:

Hi Liang,

Thanks Liang for reply..

Ans1:
I tried by using HFile block size of 32 KB and bloom filter is enabled.
The
random read performance is 1 records in 23 secs.

Ans2:
We are retrieving all the 1 rows in one call.

Ans3:
Disk detai:
Model Number: ST2000DM001-1CH164
Serial Number: Z1E276YF

Please suggest some more optimization

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 5:11 PM, 谢良 xieli...@xiaomi.com wrote:

First, it's probably helpless to set block size to 4KB, please refer
to
the beginning of HFile.java:

Smaller blocks are good
* for random access, but require more memory to hold the block index,
and
may
* be slower to create (because we must flush the compressor stream at
the
* conclusion of each data block, which leads to an FS I/O flush).
Further, due
* to the internal caching in Compression codec, the smallest possible
block
* size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we
couldn't
expect too much if the requests are one by one.

Third, could you provide more info about your DN disk numbers and IO
utils ?

Thanks,
Liang

发件人: Ankit Jain [ankitjainc...@gmail.com]
发送时间: 2013年4月15日 18:53
收件人: user@hbase.apache.org
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the bloom
filter(ROW). The maximum read performance that I was able to achieve
is
1 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal
rishabh.agra...@impetus.co.in wrote:

Interesting. Can you explain why this happens?

-Original Message-
From: Anoop Sam John [mailto:anoo...@huawei.com]
Sent: Monday, April 15, 2013 3:47 PM
To: user@hbase.apache.org
Subject: RE: HBase random read performance

Ankit
I guess you might be having default HFile block
size
which is 64KB.
For random gets a lower value will be better. Try will some thing
like
8KB
and check the latency?

Ya ofcourse blooms can help (if major compaction was not done at the
time
of testing)

-Anoop-

From: Ankit Jain [ankitjainc...@gmail.com]
Sent: Saturday, April 13, 2013 11:01 AM
To: user@hbase.apache.org
Subject: HBase random read performance

Hi All,

We are using HBase 0.94.5 and Hadoop 1.0.4.

We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
Each
regionserver has 8 GB RAM.

We have loaded 25 millions records in HBase table, regions are
pre-split
into 16 regions

RE: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

Re: 答复: HBase random read performance

9 matches

Site Navigation

Mail list logo

Footer information