https://issues.apache.org/jira/browse/HADOOP-3307
On 4/29/08 9:25 AM, "Kayla Jay" <[EMAIL PROTECTED]> wrote: > Thanks. Do you have the jira issue number for that so that I can keep an eye > out on it? > > Thanks. > > > ----- Original Message ---- > From: Ted Dunning <[EMAIL PROTECTED]> > To: [email protected] > Sent: Tuesday, April 29, 2008 12:07:32 PM > Subject: Re: Map/Reduce with XML files .. > > > Just adapt TextInput format so that it reads to the next file boundary > instead of the next new line. > > There is also a jira out for file archiving that would do all of this (and > more) for you. If you don't want to wait, then the mod to TIF is pretty > easy. > > > On 4/28/08 5:14 PM, "Kayla Jay" <[EMAIL PROTECTED]> wrote: > >> Yes, I'm talking about a collection of small xml files stored in "container" >> files. I.e there's a lot and lots of small xml files collected into big >> files. Not one gargantuan XML file. How would you go about using hadoop with >> splits and processing and handling these sorts of XML files? >> >> >> ----- Original Message ---- >> From: Ted Dunning <[EMAIL PROTECTED]> >> To: [email protected] >> Sent: Monday, April 28, 2008 4:16:20 PM >> Subject: Re: Map/Reduce with XML files .. >> >> >> The only real problem with xml and map-reduce is if you are talking about >> one gargantuan XML file. That makes correct splitting difficult. >> >> If you are talking about millions or billions of small xml files (stored in >> some sort of container file), then hadoop should be pretty easy to use. >> >> >> On 4/28/08 9:39 AM, "Kayla Jay" <[EMAIL PROTECTED]> wrote: >> >>> Hello >>> >>> Has anyone had any experience with processing xml files within Hadoop within >>> their maps/reduces? >>> In particular, has anyone used any sort of XQuery/XPath processing within >>> their maps/reduces? >>> Say I have XML string passed to the map and now I want to find something in >>> particular via XQuery/XPath or some sort to run numbers on occurrences or >>> parse out a particular section within the XML. >>> >>> Anyone done any XML processing looking for things within XML? Then, >>> aggregate >>> common pieces together in the reduces ? >>> >>> >>> On another note, >>> Has anyone figured out splits for XML files? >>> Has anyone written a custom XML reader other than the StreamXmlRecordReader? >>> The only one I've read about and can find anything is: >>> http://www.nabble.com/map-reduce-function-on-xml-string-td15816818.html >>> >>> >>> Thanks. >>> >>> >>> >>> >>> > _____________________________________________________________________________>> > > _ >>> ______ >>> Be a better friend, newshound, and >>> know-it-all with Yahoo! Mobile. Try it now. >>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ >> >> >> >> _____________________________________________________________________________>> _ >> ______ >> Be a better friend, newshound, and >> know-it-all with Yahoo! Mobile. Try it now. >> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > ______________________________________________________________________________ > ______ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
