No it does not mean you can't use them in map reduce operations, and they are especially built with that in mind.
InputFormat generally wraps over a simple Reader class (of the file format it is of). Its not difficult to write one. Considering your specific requirements of reading files, you may also find it better to write your own input format classes for TFiles. MapFiles are essentially SequenceFiles as already explained and can be used with the same's IF class in map reduce operations. For fine tuned reading of a map file, you will need your own impl, which isn't hard to do either. Hadoop is very modular at the IO level. Please look into the Reader and Writer class impls or API of each of the file format you are interested in, then writing an input format class should be doable enough. On Oct 4, 2010 2:34 PM, "Sina Samangooei" <[email protected]> wrote: Hi, Thanks for the Quick response. It's good that there are provisions being made for the kind of problem i'm trying to solve. However, I can't seem to find any sort of TFileInputFormat or MapFileInputFormat. Does this mean TFiles and MapFiles can't be simultaneously used for random access as well as map reduce tasks? If this is the case TFiles and MapFiles are not suitable for my purposes. I require the ability to perform large scale map-reduce operations on ALL of the files, while at the same time having the ability to quickly access an individual file. Two separate use cases, but both quite important. An option might be to duplicate the data? Literally hold two copies, but that just doesn't sit right. Therefore, for now at least, i will continue with my index generation scheme, i think i've found a work around that involves generating the index outside of hadoop (i.e. not through a map-reduce task). This is slightly slower than generating the index as part of a map reduce task, but once generated the index should make access of files and various other operations much faster Thanks again, - Sina On 2 Oct 2010, at 17:36, Owen O'Malley wrote: > On Sat, Oct 2, 2010 at 5:25 AM, Harsh J <qwertyman...
