[
https://issues.apache.org/jira/browse/HDFS-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arpit Agarwal updated HDFS-10958:
---------------------------------
Attachment: HDFS-10958.05.patch
Thank you for the thorough review [~xyao]! I really appreciate it. The v05
patch addresses most of your feedback (comments below). [This
commit|https://github.com/arp7/hadoop/commit/1a5601460e830a161da1ad8ed586b3150c09971e#diff-5d7a0451c23486b45f5bda85b7022e5f]
shows the delta between the v04 and v05 patches.
Comments below:
# bq. NIT: Can you update the comment (line 745, line 747) to reflect the
changes of the returned type? "FileInputStream" -> "FileDescriptor"
Fixed.
# bq. Line 149: BlockMetadataHeader#readHeader(File file) can be removed
Removed.
# bq. NIT: Line 1033: BlockReceiver#adjustCrcFilePosition(). can we use
streams.flushChecksumOut() here?
We need to call flush on the buffered output stream here. Calling
streams.flushChecksumOut() will not flush the buffered data to underlying
FileOutputStream.
# bq. NIT: Line 59: Can we move DatanodeUtil#createFileWithExistsCheck to
FileIoProvider like we do for mkdirsWithExistsCheck/deleteWithExistsCheck?
This method was awkward to adapt to the call pattern in FileIoProvider. However
I do pass individual operations to the FileIoProvider so the exists/create
calls will be instrumented. Let me know if you feel strongly about it. :)
# bq. Line 1365: DataStorage#fullyDelete(). I'm OK with deprecate it.
Done. Removed the unused method.
# bq. NIT: Can you add a short description for the new key added or add cross
reference to the description in FileIoProvider class description.
I intentionally haven't documented this key as it's not targeted for end users.
I have the following text in the FileIoProvider javadoc. Let me know if this
looks sufficient for now.
{code}
* Behavior can be injected into these events by implementing
* {@link FileIoEvents} and replacing the default implementation
* with {@link DFSConfigKeys#DFS_DATANODE_FILE_IO_EVENTS_CLASS_KEY}.
{code}
# bq. NIT: these imports re-ordered with the imports below it
I don't see this issue in my diffs. Let me know if you still see it.
# bq. Line 1075: DatanodeUtil.dirNoFilesRecursive() can be wrapped into
FileIoProvider.java to get some aggregated metrics of dirNoFilesRecursive() in
addition to FileIoProvider#listFiles().
I deferred doing since any disk slowness will show up in the
fileIoProvider.listFiles call. Can we re-evaluate instrumenting the recursive
call in a follow up jira?
# bq. Line: 202: this is a bug. We should delete the tmpFile instead of the
file.
Good catch, fixed.
# bq. Line 322,323: Should we close crcOut like blockOut and metataRAF here?
Can this be improved with a try-with-resource to avoid leaking.
Good catch, fixed it. It looks like this is a pre-existing bug. We can't use
try-with-resources though as we only want to close the streams when there is an
exception.
# bq. Line 89: FileIoEvents#onFailure() can we add a begin parameter for the
failure code path so that we can track the time spent on FileIo/Metadata before
failure.
Done.
# bq. CountingFileIoEvents.java - Should we count the number of errors in
onFailure()?
Done.
# bq. FileIoProvider.java - NIT: some of the methods are missing Javadocs for
the last few added @param such as flush()/listDirectory()/linkCount()/mkdirs,
etc.
Added.
# bq. Line 105: NIT: We can add a tag to the enum FileIoProvider#OPERATION to
explicitly describe the operation type FileIo/Metadata, which could simplify
the FileIoEvents interface. I'm OK with the current implementation, which is
also good and easy to follow.
Leaving it as it is for now to avoid complicating the patch further, but we can
definitely revise the interface as we work on implementations.
# bq. Line 155: I think we should put sync() under fileIo op instead of
metadata op based on we are passing true
Done.
# bq. Line 459: FileIoProvider#fullyDelete() should we declare exception just
for fault injection purpose? FileUtil.fullyDelete() itself does not throw.
Good point. The only exception we could get in fullyDelete is a
RuntimeException so there is no change to the signature. I decided to pass all
exceptions to the failure handler (except errors) and let it decide which ones
are interesting to it.
# bq. Line 575: NIT: File f -> File dir, Line 598: NIT: File f -> File dir
Fixed both.
# bq. Line 148: ReplicaOutputStreams#writeDataToDisk(), should we change the
dataOut/checksumOut to use the FileIoProvider#WrappedFileoutputStream to get
the FileIo write counted properly?
These are already wrapped output streams. See LocalReplicaInPipeline.java:310.
# bq. Line 83 readDataFully() should we change the dataIn/checksumIn to use
the FileIoProvider#WrappedFileInputStream to get the FileIo read counted
properly?
These are also wrapped input streams. See LocalReplica#getDataInputStream where
the streams are allocated.
I also removed the gson dependency per offline feedback from [~anu].
> Add instrumentation hooks around Datanode disk IO
> -------------------------------------------------
>
> Key: HDFS-10958
> URL: https://issues.apache.org/jira/browse/HDFS-10958
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Xiaoyu Yao
> Assignee: Arpit Agarwal
> Attachments: HDFS-10958.01.patch, HDFS-10958.02.patch,
> HDFS-10958.03.patch, HDFS-10958.04.patch, HDFS-10958.05.patch
>
>
> This ticket is opened to add instrumentation hooks around Datanode disk IO
> based on refactor work from HDFS-10930.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]