[
https://issues.apache.org/jira/browse/HDFS-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Walter Su updated HDFS-8501:
----------------------------
Description:
Erasure Coding: Improve memory efficiency of BlockInfoStriped
Assume we have a BlockInfoStriped:
{noformat}
triplets[] = {s0, s1, s2, s3}
indices[] = {0, 1, 2, 3}
{noformat}
{{s0}} means {{storage_0}}
When we run balancer/mover to re-locate replica on s2, firstly it becomes:
{noformat}
triplets[] = {s0, s1, s2, s3, s4}
indices[] = {0, 1, 2, 3, 2}
{noformat}
Then the replica on s2 is removed, finally it becomes:
{noformat}
triplets[] = {s0, s1, null, s3, s4}
indices[] = {0, 1, -1, 3, 2}
{noformat}
The worst case is:
{noformat}
triplets[] = {null, null, null, null, s4, s5, s6, s7}
indices[] = {-1, -1, -1, -1, 0, 1, 2, 3}
{noformat}
We should learn from {{BlockInfoContiguous.removeStorage(..)}}. When a storage
is removed, we move the last item front.
With the improvement, the worst case become:
{noformat}
triplets[] = {s4, s5, s6, s7, null}
indices[] = {0, 1, 2, 3, -1}
{noformat}
We have an empty slot.
Notes:
Assume we copy 4 storage first, then delete 4. Even with the improvement, the
worst case could be:
{noformat}
triplets[] = {s4, s5, s6, s7, null, null, null, null}
indices[] = {0, 1, 2, 3, -1, -1, -1, -1}
{noformat}
But the Balancer uses {{delHint}}. So when add one will always delete one. So
this case won't happen for striped and contiguous blocks.
*idx_i must be moved to slot_i.* So slot_i will have idx_i. So we can do
further improvement in HDFS-8032.
was:
Erasure Coding: Improve memory efficiency of BlockInfoStriped
Assume we have a BlockInfoStriped:
{noformat}
triplets[] = {s0, s1, s2, s3}
indices[] = {0, 1, 2, 3}
{noformat}
{{s0}} means {{storage_0}}
When we run balancer/mover to re-locate replica on s2, firstly it becomes:
{noformat}
triplets[] = {s0, s1, s2, s3, s4}
indices[] = {0, 1, 2, 3, 2}
{noformat}
Then the replica on s2 is removed, finally it becomes:
{noformat}
triplets[] = {s0, s1, null, s3, s4}
indices[] = {0, 1, -1, 3, 2}
{noformat}
The worst case is:
{noformat}
triplets[] = {null, null, null, null, s4, s5, s6, s7}
indices[] = {-1, -1, -1, -1, 0, 1, 2, 3}
{noformat}
We should learn from {{BlockInfoContiguous.removeStorage(..)}}. When a storage
is removed, we move the last item front.
With the improvement, the worst case become:
{noformat}
triplets[] = {s4, s5, s6, s7, null}
indices[] = {0, 1, 2, 3, -1}
{noformat}
We have an empty slot.
Notes:
Assume we copy 4 storage first, then delete 4. Even with the improvement, the
worst case could be:
{noformat}
triplets[] = {s4, s5, s6, s7, null, null, null, null}
indices[] = {0, 1, 2, 3, -1, -1, -1, -1}
{noformat}
But the Balancer uses {{delHint}}. So when add one will always delete one. So
this case won't happen for striped and contiguous blocks.
* idx_i must be moved to slot_i. So slot_i will have idx_i. So we can do
further improvement
> Erasure Coding: Improve memory efficiency of BlockInfoStriped
> -------------------------------------------------------------
>
> Key: HDFS-8501
> URL: https://issues.apache.org/jira/browse/HDFS-8501
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Walter Su
> Assignee: Walter Su
>
> Erasure Coding: Improve memory efficiency of BlockInfoStriped
> Assume we have a BlockInfoStriped:
> {noformat}
> triplets[] = {s0, s1, s2, s3}
> indices[] = {0, 1, 2, 3}
> {noformat}
> {{s0}} means {{storage_0}}
> When we run balancer/mover to re-locate replica on s2, firstly it becomes:
> {noformat}
> triplets[] = {s0, s1, s2, s3, s4}
> indices[] = {0, 1, 2, 3, 2}
> {noformat}
> Then the replica on s2 is removed, finally it becomes:
> {noformat}
> triplets[] = {s0, s1, null, s3, s4}
> indices[] = {0, 1, -1, 3, 2}
> {noformat}
> The worst case is:
> {noformat}
> triplets[] = {null, null, null, null, s4, s5, s6, s7}
> indices[] = {-1, -1, -1, -1, 0, 1, 2, 3}
> {noformat}
> We should learn from {{BlockInfoContiguous.removeStorage(..)}}. When a
> storage is removed, we move the last item front.
> With the improvement, the worst case become:
> {noformat}
> triplets[] = {s4, s5, s6, s7, null}
> indices[] = {0, 1, 2, 3, -1}
> {noformat}
> We have an empty slot.
> Notes:
> Assume we copy 4 storage first, then delete 4. Even with the improvement, the
> worst case could be:
> {noformat}
> triplets[] = {s4, s5, s6, s7, null, null, null, null}
> indices[] = {0, 1, 2, 3, -1, -1, -1, -1}
> {noformat}
> But the Balancer uses {{delHint}}. So when add one will always delete one. So
> this case won't happen for striped and contiguous blocks.
> *idx_i must be moved to slot_i.* So slot_i will have idx_i. So we can do
> further improvement in HDFS-8032.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)