Phantom wrote:
Which would mean that if I want to have my logs to reside in HDFS I will have to move them using copyFromLocal or some version thereof and then run Map/Reduce process against them ? Am I right ?
Yes. HDFS is probably not currently suitable for directly storing log output as it is generated. But I don't think append is actually the missing feature you need. Rather, the problem is that, currently in HDFS, until a file is closed, it does not exist. So if your server crashes and does not close its log, the log would disappear, which is probably not what you'd want.
If copying log files to HDFS is prohibitive, an alternative might be to make them available via HTTP and to write an HttpFileSystem where they could be accessed directly as MapReduce inputs (assuming that's what). An HttpFileSystem should be easy to implement and would be useful for lots of things. It need not implement things like 'delete' and 'rename' or even 'create', but rather just 'open' and 'list', so it could only be used for inputs.
Doug