Phantom wrote:
Which would mean that if I want to have my logs to reside in HDFS I will
have to move them using copyFromLocal or some version thereof and then run
Map/Reduce process against them ? Am I right ?

Yes. HDFS is probably not currently suitable for directly storing log output as it is generated. But I don't think append is actually the missing feature you need. Rather, the problem is that, currently in HDFS, until a file is closed, it does not exist. So if your server crashes and does not close its log, the log would disappear, which is probably not what you'd want.

If copying log files to HDFS is prohibitive, an alternative might be to make them available via HTTP and to write an HttpFileSystem where they could be accessed directly as MapReduce inputs (assuming that's what). An HttpFileSystem should be easy to implement and would be useful for lots of things. It need not implement things like 'delete' and 'rename' or even 'create', but rather just 'open' and 'list', so it could only be used for inputs.

Doug

Reply via email to