Arun C Murthy wrote:
Do you guys think it makes sense to provide this as a part of the MR framework itself? i.e. extend TextInputFormat into (say) URIInputFormat and the MR framework then 'fetches' the data (the 'fetcher'/'reader' is configurable with reasonable defaults provided in the framework e.g. for dfs://, http:// etc.) pointed to by the URI and then provides a 'stream' (as 'key') to the map function?
Yes, I think this would be a good addition to org.apache.hadoop.mapred.lib. I'm not sure that it should be a subclass of TextInputFormat, although it might share a RecordReader implementation with TextInputFormat. It should probably be extensible, permitting folks to supply other RecordReader implementations.
Doug
