Kayla Jay
Wed, 02 Jul 2008 13:01:15 -0700
I have two types of XML documents.
1) 1 big huge file that has many XML documents contained within
2) Logs of mid-size individual XML documents in a directory.
I'm not exactly sure how to use Pig to find associations once I can parse an
XML document.
----- Original Message ----
From: Mridul Muralidharan <[EMAIL PROTECTED]>
To: pig-user@incubator.apache.org
Sent: Tuesday, July 1, 2008 6:32:28 PM
Subject: Re: Pig + xml ?
Hi,
What does the input look like ?
- A single file with multiple xml documents ?
- A directory containing a lot of individual xml files ?
If the former, then it might be slightly tricky since you will need to
identify when an xml document "ends" and another "begins".
In simple cases, it could be an entire document in a single line - in
which case, a simple line based loader will handle things fine.
If you have a special root node in your schema, you can use that for
finding the start of a document/end of document and parse that out as a
document/sax/etc.
Regards,
Mridul
Kayla Jay wrote:
> Hi
>
> Can you use Pig with XML data files? If so, does anyone have any examples?
> I want to do something that would equate to an XPath query against the XML.
>
> Thanks.
>
>
>
>