[
https://issues.apache.org/jira/browse/PIG-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai resolved PIG-4242.
-----------------------------
Resolution: Fixed
Fix Version/s: 0.15.0
Assignee: Geza Radics
Hadoop Flags: Reviewed
Patch committed to trunk. Thanks Geza!
> For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out
> the begining of every line
> -------------------------------------------------------------------------------------------------------
>
> Key: PIG-4242
> URL: https://issues.apache.org/jira/browse/PIG-4242
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Reporter: Geza Radics
> Assignee: Geza Radics
> Fix For: 0.15.0
>
> Attachments: XMLLoaderMissingContent.patch
>
>
> XMLLoader finds the first matching position for the required tag, but applies
> this offset for all following lines as well until the closing tag. This
> causes content losses for indented xml formats with multiline contents such
> as the wikipedia xml dump:
> --- example input ---
> {code:xml}
> <page>Look,
> not a thing is missing.</page>
> {code}
> --- current ouput ---
> {code:xml}
> <page>Look, a thing is missing.</page>
> {code}
> --- expected ouput ---
> {code:xml}
> <page>Look, not a thing is missing.</page>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)