On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter <[EMAIL PROTECTED]> wrote: > Thank you, Tom. > > Forgive me for being dense, but I don't understand your reply: >
Sorry! I'll try to explain it better (see below). > > Do you mean that it is possible to use the Hadoop daemons with S3 but > the default filesystem must be HDFS? The HDFS daemons use the value of "fs.default.name" to set the namenode host and port, so if you set it to a s3 URI, you can't run the HDFS daemons. So in this case you would use the start-mapred.sh script instead of start-all.sh. > If that is the case, can I > specify the output filesystem on a per-job basis and can that be an S3 > FS? Yes, that's exactly how you do it. > > Also, is there a particular reason to not allow S3 as the default FS? You can allow S3 as the default FS, it's just that then you can't run HDFS at all in this case. You would only do this if you don't want to use HDFS at all, for example, if you were running a MapReduce job which read from S3 and wrote to S3. It might be less confusing if the HDFS daemons didn't use fs.default.name to define the namenode host and port. Just like mapred.job.tracker defines the host and port for the jobtracker, dfs.namenode.address (or similar) could define the namenode. Would this be a good change to make? Tom
