Hi Roshan and Julian, The har file system can be used as a input filesystem. You can just provide the input to map reduce as har:///something/some.har , where some.har is your har archive. This way amp reduce will use har filesystem as an input. The only problem being that maps cannot run across logical files in har.
You can specify whatever input format these files have/had before you included them into har archives. The point being that har:/// can be used as a input filesystem for map reduce, which will give map reduce a view of logical files inside of har. Hope this helps. mahadev On 6/26/09 2:37 AM, "jchernandez" <jchernan...@agnitio.es> wrote: > > I also need help with this. I need to know how to handle a HAR file when it > is the input to a MapReduce task. How do we read the HAR file so we can work > on the individual logical files? I suppose we need to create our own > InputFormat and RecordReader files, but I´m not sure how to proceed. > > Julian > > > Roshan James-3 wrote: >> >> When I run map reduce task over a har file as the input, I see that the >> input splits refer to 64mb byte boundaries inside the part file. >> >> My mappers only know how to process the contents of each logical file >> inside >> the har file. Is there some way by which I can take the offset range >> specified by the input split and determine which logical files lie in that >> offset range? (How else would one do map reduce over a har file?) >> >> Roshan >> >>