Tom White wrote:
You can allow S3 as the default FS, it's just that then you can't run HDFS at all in this case. You would only do this if you don't want to use HDFS at all, for example, if you were running a MapReduce job which read from S3 and wrote to S3.
Can't one work around this by using a different configuration on the client than on the namenodes and datanodes? The client should be able to set fs.default.name to an s3: uri, while the namenode and datanode must have it set to an hdfs: uri, no?
Would it be useful to add command-line options to namenode and datanode that override the configuration, so that one could start non-default HDFS daemons?
It might be less confusing if the HDFS daemons didn't use fs.default.name to define the namenode host and port. Just like mapred.job.tracker defines the host and port for the jobtracker, dfs.namenode.address (or similar) could define the namenode. Would this be a good change to make?
Probably. For back-compatibility we could leave it empty by default, deferring to fs.default.name, only if folks specify a non-empty dfs.namenode.address would it be used.
Doug
