[ 
https://issues.apache.org/jira/browse/HADOOP-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-5958:
----------------------------------

    Fix Version/s:     (was: 0.21.0)
                   0.22.0
           Status: Open  (was: Patch Available)

bq. what happens when you run that same benchmark against an NFS drive?

Every class using DF assumes- reasonably- that the resource behaves like a 
local drive. Configuring e.g. LocalDirAllocator or FSDataset to use a remote FS 
and then worrying that DF might take milliseconds instead of microseconds is 
focusing on the noise.

I don't understand the virtue of the current approach, as DF seems like a fixed 
set of functionality not meriting an abstract class, factory, etc. Is PosixDF 
needed for anything but getMount and getFilesystem? The latter has no callers 
and the former has one that creates an instance and discards all but the mount. 
This suggests two approaches:
# If nearly all uses of DF involve delegating to a java.io.File, is there any 
reason not to simply replace uses of DF with File? Everywhere DF is used, a 
local FS is assumed. As Steve points out, if this were otherwise, other designs 
would be preferred.
# If DF retained Shell as a subtype and used the java.io.File methods where 
appropriate, is the cost really prohibitive? Few of these are created and other 
than a faster implementation of most calls, nothing else changes. The costs 
incurred for keeping everything intact appear trivial.

Neither of these is an incompatible change, assuming java.io.File is correctly 
implemented. They're not even mutually exclusive.

A few nits:
* Incompatibly moving {{DF::main}} seems unnecessary
* The comment on {{DF_INTERVAL_DEFAULT}} should be javadoc
* While the original didn't have it either, DF methods should have javadoc
* In {{getPercentUsed}}, using {{cap}} in the calculation of {{used}} avoids 
the second call to {{getCapacity}}
* If the current design is retained (because some architecture has a faulty 
java.io.File impl?), it should be possible to use PosixDF exclusively using the 
config passed to {{DF::getDF}} (could be named {{DF::get}}).
* The current patch also requires changes to HDFS that must be committed with 
these. If retained, please open an issue and link
* {{getFilesystem}} and {{getMounts}} should probably be deprecated, even 
removed from DF since one needs to explicitly instantiate PosixDF to make the 
call. The only reason {{getMounts}} is there is because that's the command it's 
scraped from, anyway.

> Use JDK 1.6 File APIs in DF.java wherever possible
> --------------------------------------------------
>
>                 Key: HADOOP-5958
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5958
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Devaraj Das
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-5958-hdfs.patch, HADOOP-5958-mapred.patch, 
> HADOOP-5958.2.patch, HADOOP-5958.3.patch, HADOOP-5958.4.patch, 
> HADOOP-5958.patch
>
>
> JDK 1.6 has File APIs like File.getFreeSpace() which should be used instead 
> of spawning a command process for getting the various disk/partition related 
> attributes. This would avoid spikes in memory consumption by tasks when 
> things like LocalDirAllocator is used for creating paths on the filesystem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to