pig-user  

Re: Pig + xml ?

Mark Chadwick
Tue, 01 Jul 2008 12:00:48 -0700

XML isn't particularly well suited for a Map/Reduce environment, however.
Hierarchical data is very tough to partition out to mappers.  On top of
that, because the data will be partitioned in 64M blocks (by default),
there's a very good chance that a random 64M chunk of a large XML file will
not even be parsable (opening tags in one block, closing tags in another).


On Tue, Jul 1, 2008 at 2:30 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote:

> This can work but you would need to write a custom loader to parse the
> data: http://wiki.apache.org/pig/StorageFunction
>
> Olga
>
> > -----Original Message-----
> > From: Kayla Jay [EMAIL PROTECTED]
> > Sent: Tuesday, July 01, 2008 11:24 AM
> > To: pig-user@incubator.apache.org
> > Subject: Pig + xml ?
> >
> > Hi
> >
> > Can you use Pig with XML data files?  If so, does anyone have
> > any examples?
> > I want to do something that would equate to an XPath query
> > against the XML.
> >
> > Thanks.
> >
> >
> >
> >
> >
>