[ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manoj Govindassamy updated HDFS-11847: -------------------------------------- Attachment: HDFS-11847.03.patch Thanks for the detailed review [~xiaochen]. Attached v03 patch to address the following. Please take a look. 1. Deprecated the old API and added a new one which accepts the additional type argument to filter the result by. 2. Updated {{FSN#listOpenFiles}} to check for {{ALL_OPEN_FILES}} type first and then for combination filtering option later. But, the result set and the reporting doesn't differentiate the entries by type. For this, we need to add the type to the {{OpenFileEntry}}. Will do this. 3. About printing DataNodes details in the results, planning to take this enhancement along with the pending item in (2) in a separate jira if you are ok. I need to change the proto, the handling of the 4. Yes, better to return as much result as possible. Made {{DatanodeAdminManager#processBlocksInternal}} to log the warning message on unexpected open files and continue to the next one. 5. In {{DatanodeAdminManager#processBlocksInternal}}, the computation is at the DataNode level. There can be multiple blocks across DNs for the same file and the full count need to be tracked for JMX reporting purposes. So, retaining the existing lowRedundancyBlocksInOpenFiles field. When I removed this field and piggy backed on the {{lowRedundancyOpenFiles.size()}}, the actual count was lesser than the expected for few tests. 6. In {{LeavingServiceStatus}} both members are needed due to (5) 7. Updated the comment for the class {{LeavingServiceStatus}} 8. {{FSN#getFilesBlockingDecom}} added hasReadLock() 9. {{TestDecommission#verifyOpenFilesBlockingDecommission}} PrintStream is now copied before the exchange and restored to the old one. Good find. 10. {{DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY}} value is actually in seconds. So, it is 1000 seconds and not 1 sec. Anyways, updated this to Max value. > Enhance dfsadmin listOpenFiles command to list files blocking datanode > decommissioning > -------------------------------------------------------------------------------------- > > Key: HDFS-11847 > URL: https://issues.apache.org/jira/browse/HDFS-11847 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, > HDFS-11847.03.patch > > > HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list > all the open files in the system. > Additionally, it would be very useful to only list open files that are > blocking the DataNode decommissioning. With thousand+ node clusters, where > there might be machines added and removed regularly for maintenance, any > option to monitor and debug decommissioning status is very helpful. Proposal > here is to add suboptions to {{listOpenFiles}} for the above case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org