[ 
https://issues.apache.org/jira/browse/HDFS-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10958:
---------------------------------
    Attachment: HDFS-10958.05.patch

Thank you for the thorough review [~xyao]! I really appreciate it. The v05 
patch addresses most of your feedback (comments below). [This 
commit|https://github.com/arp7/hadoop/commit/1a5601460e830a161da1ad8ed586b3150c09971e#diff-5d7a0451c23486b45f5bda85b7022e5f]
 shows the delta between the v04 and v05 patches.

Comments below:
# bq. NIT: Can you update the comment (line 745, line 747) to reflect the 
changes of the returned type? "FileInputStream" -> "FileDescriptor"
Fixed.
#  bq. Line 149: BlockMetadataHeader#readHeader(File file) can be removed
Removed.
# bq. NIT: Line 1033: BlockReceiver#adjustCrcFilePosition().  can we use 
streams.flushChecksumOut() here?
We need to call flush on the buffered output stream here. Calling 
streams.flushChecksumOut() will not flush the buffered data to underlying 
FileOutputStream.
# bq. NIT: Line 59: Can we move DatanodeUtil#createFileWithExistsCheck to 
FileIoProvider like we do for mkdirsWithExistsCheck/deleteWithExistsCheck?
This method was awkward to adapt to the call pattern in FileIoProvider. However 
I do pass individual operations to the FileIoProvider so the exists/create 
calls will be instrumented. Let me know if you feel strongly about it. :)
# bq. Line 1365: DataStorage#fullyDelete(). I'm OK with deprecate it.
Done. Removed the unused method.
# bq. NIT: Can you add a short description for the new key added or add cross 
reference to the description in FileIoProvider class description.
I intentionally haven't documented this key as it's not targeted for end users. 
I have the following text in the FileIoProvider javadoc. Let me know if this 
looks sufficient for now.
{code}
 * Behavior can be injected into these events by implementing
 * {@link FileIoEvents} and replacing the default implementation
 * with {@link DFSConfigKeys#DFS_DATANODE_FILE_IO_EVENTS_CLASS_KEY}.
{code}
# bq. NIT: these imports re-ordered with the imports below it
I don't see this issue in my diffs. Let me know if you still see it.
# bq. Line 1075: DatanodeUtil.dirNoFilesRecursive() can be wrapped into 
FileIoProvider.java to get some aggregated metrics of dirNoFilesRecursive() in 
addition to FileIoProvider#listFiles().
I deferred doing since any disk slowness will show up in the 
fileIoProvider.listFiles call. Can we re-evaluate instrumenting the recursive 
call in a follow up jira?
# bq.  Line: 202: this is a bug. We should delete the tmpFile instead of the 
file.
Good catch, fixed.
# bq. Line 322,323: Should we close crcOut like blockOut and metataRAF here? 
Can this be improved with a try-with-resource to avoid leaking.
Good catch, fixed it. It looks like this is a pre-existing bug. We can't use 
try-with-resources though as we only want to close the streams when there is an 
exception.
# bq. Line 89: FileIoEvents#onFailure() can we add a begin parameter for the 
failure code path so that we can track the time spent on FileIo/Metadata before 
failure.
Done.
# bq. CountingFileIoEvents.java - Should we count the number of errors in 
onFailure()? 
Done.
# bq. FileIoProvider.java - NIT: some of the methods are missing Javadocs for 
the last few added @param such as flush()/listDirectory()/linkCount()/mkdirs, 
etc.
Added.
# bq. Line 105: NIT: We can add a tag to the enum FileIoProvider#OPERATION to 
explicitly describe the operation type FileIo/Metadata, which could simplify 
the FileIoEvents interface. I'm OK with the current implementation, which is 
also good and easy to follow. 
Leaving it as it is for now to avoid complicating the patch further, but we can 
definitely revise the interface as we work on implementations.
# bq. Line 155: I think we should put sync() under fileIo op instead of 
metadata op based on we are passing true
Done.
# bq. Line 459: FileIoProvider#fullyDelete() should we declare exception just 
for fault injection purpose? FileUtil.fullyDelete() itself does not throw. 
Good point. The only exception we could get in fullyDelete is a 
RuntimeException so there is no change to the signature. I decided to pass all 
exceptions to the failure handler (except errors) and let it decide which ones 
are interesting to it.
# bq. Line 575: NIT: File f -> File dir, Line 598: NIT: File f -> File dir
Fixed both.
# bq. Line 148: ReplicaOutputStreams#writeDataToDisk(), should we change the 
dataOut/checksumOut to use the FileIoProvider#WrappedFileoutputStream to get 
the FileIo write counted properly?
These are already wrapped output streams. See LocalReplicaInPipeline.java:310.
# bq.  Line 83 readDataFully() should we change the dataIn/checksumIn  to use 
the FileIoProvider#WrappedFileInputStream to get the FileIo read counted 
properly? 
These are also wrapped input streams. See LocalReplica#getDataInputStream where 
the streams are allocated.

I also removed the gson dependency per offline feedback from [~anu].


> Add instrumentation hooks around Datanode disk IO
> -------------------------------------------------
>
>                 Key: HDFS-10958
>                 URL: https://issues.apache.org/jira/browse/HDFS-10958
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Xiaoyu Yao
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-10958.01.patch, HDFS-10958.02.patch, 
> HDFS-10958.03.patch, HDFS-10958.04.patch, HDFS-10958.05.patch
>
>
> This ticket is opened to add instrumentation hooks around Datanode disk IO 
> based on refactor work from HDFS-10930.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to