[
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564053#comment-14564053
]
Walter Su commented on HDFS-8481:
---------------------------------
This is user's logic of calling pread. The {{buf}} is reused until the entire
file has been read.
{code}
byte[] buf = new buf[4096];
while(readLen = in.read(buf)){
..
}
{code}
Assume we has a 768mb file (128mb * 6) which exactly contains 1 block group. We
lost one block so we have to decode until 768mb data has been read.
{code}
byte[][] decodeInputs =
new byte[dataBlkNum + parityBlkNum][(int)
alignedStripe.getSpanInBlock()];
{code}
For every {{alignedStripe}} being read we need a new {{decodeInputs}}. For
everytime user calls pread, we have new multiple {{alignedStripe}}. For
everytime user calls stateful read, we have 1~3 new {{alignedStripe}}.
Which means, when entire 768mb data has been read, we have newed 128mb*9
byte[][] {{decodeInputs}} garbage waiting gc.
We cannot depend {{DFSStripedInputStream}} to keep {{decodeInputs}} object and
reuse it. Because every {{SpanInBlock}} is different.
I'm not sure if I make it clear. If so, it's an issue right? (Not related to
this jira)
bq. we need more abstraction than the util.
I'm +1 for this idea. I think we can resolve the {{decodeInputs}} issue in that
abstraction.
> Erasure coding: remove workarounds in client side stripped blocks recovering
> ----------------------------------------------------------------------------
>
> Key: HDFS-8481
> URL: https://issues.apache.org/jira/browse/HDFS-8481
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: HDFS-8481-HDFS-7285.00.patch,
> HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch
>
>
> After HADOOP-11847 and related fixes, we should be able to properly calculate
> decoded contents.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)