[
https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772878#action_12772878
]
Todd Lipcon commented on HADOOP-3205:
-------------------------------------
Been looking at this ticket tonight. I'm not sure exactly what you're getting
it. As I am understanding it, the wrapping looks something like:
User Reader -> FSInputChecker -> FSInputChecker subclass -> BufferedInputStream
-> Underlying source
e.g:
{noformat}
java.io.FileInputStream.readBytes(FileInputStream.java:Unknown line)
java.io.FileInputStream.read(FileInputStream.java:199)
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.read(RawLocalFileSystem.jav
a:90)
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java
:143)
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
java.io.BufferedInputStream.read(BufferedInputStream.java:317)
java.io.DataInputStream.read(DataInputStream.java:132)
org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:385)
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:224)
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
java.io.DataInputStream.read(DataInputStream.java:83)
org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:72)
{noformat}
The user's buffer size passed in to fs.open(...) controls the size of the
BufferedInputStream that wraps the underlying input stream (ie raw file or
socket). The FSInputChecker does indeed call read() on that BufferedInputStream
once for every 512 bytes (directly into the user buffer), but in my profiling
this doesn't seem to be a CPU hog, since it only results in one syscall to the
underlying stream for every io.file.buffer.size.
As a test of the CPU overhead, I put an 800M file (checksummed) in /dev/shm and
profiled hadoop fs -cat with io.file.buffer.size=64K. This obviously stresses
the CPU hogs and syscall overhead without any actual disk involved. The top
consumers are:
{noformat}
1 61.17% 61.17% 4363 300617
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk
2 13.11% 74.28% 935 300618 java.io.FileInputStream.readBytes
3 7.71% 82.00% 550 300632 java.io.DataInputStream.read
4 5.02% 87.02% 358 300600 java.io.FileOutputStream.writeBytes
5 3.76% 90.77% 268 300657 java.io.DataInputStream.readFully
6 1.67% 92.44% 119 300631 java.io.DataInputStream.readFully
{noformat}
The particular line of readChecksumChunk that's consuming the time is line 241
(sum.update) - this indicates that the overhead here is just from checksumming
and not from memory copies. The one possible gain I could see here would be to
revert to a JNI implementation of CRC32 that can do multiple checksum chunks at
once - we found that JNI was slow due to a constant overhead "jumping the gap"
to C for small sizes, but we can probably get 50% checksum speedup for some
buffers. This was originally rejected in HADOOP-6148 due to the complexity of
maintaining two different CRC32 implementations.
Are you suggesting here that we could do away with the internal buffer and
assume that users are always going to do large reads? Doesn't that violate the
contract of fs.open taking a buffer size?
> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>
> Key: HADOOP-3205
> URL: https://issues.apache.org/jira/browse/HADOOP-3205
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have
> access to full user buffer. At any time DFS can access only up to 512 bytes
> even though user usually reads with a much larger buffer (often controlled by
> io.file.buffer.size). This requires implementations to double buffer data if
> an implementation wants to read or write larger chunks of data from
> underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two
> separate jiras.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.