[
https://issues.apache.org/jira/browse/HDFS-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141366#comment-17141366
]
Danil Lipovoy edited comment on HDFS-15409 at 6/21/20, 9:38 AM:
----------------------------------------------------------------
I think you are absolutely right that the real blockId is completely
unpredictable. On the other hand we don't need to predict it, right? As I
understand we need to know about distribution. If it is evenly - all is ok.
So, I did 2 tests:
1. Added information about the last digit into log:
{code:java}
public ShortCircuitCache getShortCircuitCache(long idx)
{
LOG.info("Last digit: " + idx % 10);
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}{code}
Run read some HBase table:
Collected distribution:
cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last
digit"|awk '\{print $7}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n",
$2, $1}'|sort
0 157128
1 171082
2 171019
3 171143
4 171421
5 170665
6 171525
7 167854
8 167641
9 157015
Difference between min-max slots ~9%.
2. Added CRC32 hash:
{code:java}
public ShortCircuitCache getShortCircuitCache(Long idx)
{
CRC32 crc = new CRC32();
crc.reset();
crc.update(idx.byteValue());
idx = crc.getValue();
LOG.info("Last crc digit: " + idx % 10);
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}{code}
Run the same test and check the distribution:
cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last
crc digit"|awk '\{print $8}'| sort | uniq -c | sort -nr | awk '\{printf
"%-8s%s\n", $2, $1}'|sort
0 140883
1 212124
2 152218
3 152024
4 141270
5 157903
6 182202
7 152417
8 209427
9 152268
Difference between min-max slots about 33%.
Any ideas?
was (Author: pustota):
I think you are absolutely right that the real blockId is completely
unpredictable. On the other hand we don't need to predict it, right? As I
understand we need to know about distribution. If it is evenly - all is ok.
So, I did 2 tests:
1. Added information about the last digit into log:
{code:java}
public ShortCircuitCache getShortCircuitCache(long idx)
{
LOG.info("Last digit: " + idx % 10);
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}{code}
Run read some HBase table:
Collected distribution:
cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last
digit"|awk '\{print $7}'| sort | uniq -c | sort -nr | awk '\{printf "%-8s%s\n",
$2, $1}'|sort
0 157128
1 171082
2 171019
3 171143
4 171421
5 170665
6 171525
7 167854
8 167641
9 157015
Difference between min-max slots less about 9%.
2. Added CRC32 hash:
{code:java}
public ShortCircuitCache getShortCircuitCache(Long idx)
{
CRC32 crc = new CRC32();
crc.reset();
crc.update(idx.byteValue());
idx = crc.getValue();
LOG.info("Last crc digit: " + idx % 10);
return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}{code}
Run the same test and check the distribution:
cat /var/log/hbase/hbase-cmf-hbase-REGIONSERVER-home.com.log.out |grep "Last
crc digit"|awk '\{print $8}'| sort | uniq -c | sort -nr | awk '\{printf
"%-8s%s\n", $2, $1}'|sort
0 140883
1 212124
2 152218
3 152024
4 141270
5 157903
6 182202
7 152417
8 209427
9 152268
Difference between min-max slots about 33%.
Any ideas?
> Optimization Strategy for choosing ShortCircuitCache
> -----------------------------------------------------
>
> Key: HDFS-15409
> URL: https://issues.apache.org/jira/browse/HDFS-15409
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Lisheng Sun
> Priority: Major
>
> When clientShortCircuitNum is 10, the probability of falling into each
> ShortCircuitCache is the same, while the probability of other
> clientShortCircuitNum is different.
> For example if clientShortCircuitNum is 3, when a lot of blockids of SSR are
> ***1, ***4, ***7, this situation will fall into a ShortCircuitCache.
> Since the real environment blockid is completely unpredictable, i think it is
> need to design a strategy which is allocated to a specific ShortCircuitCache.
> This should improve performance even more.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]