[
https://issues.apache.org/jira/browse/PIG-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geza Radics updated PIG-4242:
-----------------------------
Description:
XMLLoader finds the first matching position for the required tag, but applies
this offset for all following lines as well until the closing tag. This causes
content losses for indented xml formats with multiline contents such as the
wikipedia xml dump:
--- example input ---
{code:xml}
<page>Look,
not a thing is missing.</page>
{code}
--- current ouput ---
{code:xml}
<page>Look, a thing is missing.</page>
{code}
--- expected ouput ---
{code:xml}
<page>Look, not a thing is missing.</page>
{code}
was:
XMLLoader finds the first matching position for the required tag, but applies
this offset for all following lines as well until the closing tag. This causes
content losses for indented xml formats with multiline contents such as the
wikipedia xml dump:
--- example input ---
{code:xml}
<page>You have
not missed it</page>
{code}
--- ouput ---
{code:xml}
<page>You have missed it</page>
{code}
> For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out
> the begining of every line
> -------------------------------------------------------------------------------------------------------
>
> Key: PIG-4242
> URL: https://issues.apache.org/jira/browse/PIG-4242
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Reporter: Geza Radics
> Attachments: XMLLoaderMissingContent.patch
>
>
> XMLLoader finds the first matching position for the required tag, but applies
> this offset for all following lines as well until the closing tag. This
> causes content losses for indented xml formats with multiline contents such
> as the wikipedia xml dump:
> --- example input ---
> {code:xml}
> <page>Look,
> not a thing is missing.</page>
> {code}
> --- current ouput ---
> {code:xml}
> <page>Look, a thing is missing.</page>
> {code}
> --- expected ouput ---
> {code:xml}
> <page>Look, not a thing is missing.</page>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)