Just adapt TextInput format so that it reads to the next file boundary instead of the next new line.
There is also a jira out for file archiving that would do all of this (and more) for you. If you don't want to wait, then the mod to TIF is pretty easy. On 4/28/08 5:14 PM, "Kayla Jay" <[EMAIL PROTECTED]> wrote: > Yes, I'm talking about a collection of small xml files stored in "container" > files. I.e there's a lot and lots of small xml files collected into big > files. Not one gargantuan XML file. How would you go about using hadoop with > splits and processing and handling these sorts of XML files? > > > ----- Original Message ---- > From: Ted Dunning <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, April 28, 2008 4:16:20 PM > Subject: Re: Map/Reduce with XML files .. > > > The only real problem with xml and map-reduce is if you are talking about > one gargantuan XML file. That makes correct splitting difficult. > > If you are talking about millions or billions of small xml files (stored in > some sort of container file), then hadoop should be pretty easy to use. > > > On 4/28/08 9:39 AM, "Kayla Jay" <[EMAIL PROTECTED]> wrote: > >> Hello >> >> Has anyone had any experience with processing xml files within Hadoop within >> their maps/reduces? >> In particular, has anyone used any sort of XQuery/XPath processing within >> their maps/reduces? >> Say I have XML string passed to the map and now I want to find something in >> particular via XQuery/XPath or some sort to run numbers on occurrences or >> parse out a particular section within the XML. >> >> Anyone done any XML processing looking for things within XML? Then, >> aggregate >> common pieces together in the reduces ? >> >> >> On another note, >> Has anyone figured out splits for XML files? >> Has anyone written a custom XML reader other than the StreamXmlRecordReader? >> The only one I've read about and can find anything is: >> http://www.nabble.com/map-reduce-function-on-xml-string-td15816818.html >> >> >> Thanks. >> >> >> >> >> _____________________________________________________________________________>> _ >> ______ >> Be a better friend, newshound, and >> know-it-all with Yahoo! Mobile. Try it now. >> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > ______________________________________________________________________________ > ______ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
