[
https://issues.apache.org/jira/browse/MAPREDUCE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900652#action_12900652
]
Hong Tang commented on MAPREDUCE-2023:
--------------------------------------
The problem is due to the following code segments:
{code}
public static class ReadMapper extends IOStatMapper<Long> {
public ReadMapper() {
}
public Long doIO(Reporter reporter,
String name,
long totalSize // in bytes
) throws IOException {
// open file
DataInputStream in = fs.open(new Path(getDataDir(getConf()), name));
long actualSize = 0;
try {
for(int curSize = bufferSize;
curSize == bufferSize && actualSize < totalSize;) { // <-- HERE
curSize = in.read(buffer, 0, bufferSize);
if(curSize < 0) break;
actualSize += curSize;
reporter.setStatus("reading " + name + "@" +
actualSize + "/" + totalSize
+ " ::host = " + hostName);
}
} finally {
in.close();
}
return Long.valueOf(actualSize);
}
}
{code}
The problem is that the for-loop breaks out as soon as the previous read fails
to fulfill the full buffer. The fix is pretty simple:
{code}
for(int curSize = bufferSize; actualSize < totalSize;) {
{code}
> TestDFSIO read test may not read specified bytes.
> -------------------------------------------------
>
> Key: MAPREDUCE-2023
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2023
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: benchmarks
> Reporter: Hong Tang
>
> TestDFSIO's read test may read less bytes than specified when reading large
> files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.