We ended up subclassing TextInputFormat and adding a custom RecordReader
that starts and ends record reads on tags. The
StreamXmlRecordReader class is a good reference for this.
Prasan Ary wrote:
Hi All,
I am writing a java implementation for my map/reduce function on hadoop.
Input to this is a xml file, and the map function has to process a well
formed xml records. So far I have been unable to split the xml file at xml
record boundary to feed into my map function.
Can anybody point me to resources where forcing file split at desired
boundary is explained ?
thx,
Pra.
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.