[
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623918#action_12623918
]
Jothi Padmanabhan commented on HADOOP-3514:
-------------------------------------------
bq. There is a class DataChecksum in org.apache.hadoop.util. We probably should
use it here.
DataChecksum is intended for the typical use case of having a checkum for a
chunk (bytesPerSum). In IFile, the intent is to have one Checksum per file.
The variable 'bytesPerSum' in DataChecksum, as the name indicates, is the bytes
for which a checksum is calculated. However, it is primarily up to the user of
the DataChecksum Class to use this appropriately, inside DataCheckusm.java,
bytesPerSum is used only during the constructor and while generating the
header. Since IFile does not worry about DataChecksum.header, we could still
use DataChecksum from inside IFIle by passing any arbitrary value for
bytesPerSum in the constructor. Note that we do not know the length of the file
a priori, so we are constrained to pass a dummy value. There is one
modification needed in the DataChecskum.java though -- we need to remove the
following assert in the update function . There is already a comment that it
can be removed. Is it OK to remove this assert?
{code}
// Can be removed.
assert inSum <= bytesPerChecksum : "DataChecksum.update() : inSum " +
inSum + " > " + " bytesPerChecksum " + bytesPerChecksum ;
{code}
bq. It may be better to have ChecksumOutputStream extending FilterOutputStream,
instead of OutputStream
Yes, I will make this change.
> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
> Key: HADOOP-3514
> URL: https://issues.apache.org/jira/browse/HADOOP-3514
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0
> Reporter: Devaraj Das
> Assignee: Jothi Padmanabhan
> Fix For: 0.19.0
>
> Attachments: hadoop-3514-v1.patch, hadoop-3514-v2.patch,
> hadoop-3514-v3.patch, hadoop-3514-v4.patch, hadoop-3514-v5.patch,
> hadoop-3514-v6.patch, hadoop-3514-v7.patch, hadoop-3514-v8.patch,
> hadoop-3514.patch
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc
> into the iFile rather than having a separate file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.