[
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785171#action_12785171
]
Todd Lipcon commented on HADOOP-3205:
-------------------------------------
I did some further investigation on this:
- I ran the 1GB cat test using hprof=cpu=times to get accurate invocation
counts for the various read calls. Increasing MAX_CHUNKS from 1 to 16 does
exactly what's expected and reduces the number of calls to readChunks (and thus
the input stream reads, etc) by exactly a factor of 16. The same is true of 128
- no noticeable differences. This is because System.arraycopy doesn't get
accounted by hprof in this mode, for whatever reason.
- I imported a copy of the BufferedInputStream source and made
BufferedFSInputStream extend from it rather than from the java.io one. I added
a System.err printout right before the System.arraycopy inside read1(). When I
changed MAX_CHUNKS over from 127 to 128, I verified that it correctly avoided
these copies and read directly into the buffer. So the goal of the JIRA to get
rid of a copy was indeed accomplished.
Now the confusing part: eliminating this copy does nothing in terms of
performance. Comparing the MAX_CHUNKS=127 to MAX_CHUNKS=128 has no
statistically significant effect on the speed of catting 1G from RAM.
So my best theory right now on why it's faster is that it's simply doing fewer
function calls, each of which does more work with longer loops. This is better
for loop unrolling, instruction cache locality, and avoiding function call
overhead. Perhaps it inspires the JIT to work harder as well - who knows what
black magic lurks there :)
I think at this point I've sufficiently investigated this, unless anyone has
questions. I'll make the changes that Eli suggested and upload a new patch.
> Read multiple chunks directly from FSInputChecker subclass into user buffers
> ----------------------------------------------------------------------------
>
> Key: HADOOP-3205
> URL: https://issues.apache.org/jira/browse/HADOOP-3205
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Raghu Angadi
> Assignee: Todd Lipcon
> Attachments: hadoop-3205.txt, hadoop-3205.txt, hadoop-3205.txt,
> hadoop-3205.txt
>
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have
> access to full user buffer. At any time DFS can access only up to 512 bytes
> even though user usually reads with a much larger buffer (often controlled by
> io.file.buffer.size). This requires implementations to double buffer data if
> an implementation wants to read or write larger chunks of data from
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two
> separate jiras.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.