[
https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmed Eldawy updated PIG-3373:
------------------------------
Affects Version/s: site
Release Note:
I added a new patch that fixes this bug. It turned out that this bug happens
only when the input file is .bz2 compressed and the non-matching tag spans two
file splits in the compressed file. Since it's almost impossible to tailor an
example that has this bug since the compression is virtually non-deterministic,
I included a random generator that generates this test case.
I don't like the idea of discovering a bug using this randomly generated file
since, by definition, it's non-deterministic, I attached the test file for
reference.
The fix is still the same as the previous patch, but this time, the test fails
without this fix.
Status: Patch Available (was: Open)
> XMLLoader returns non-matching nodes when a tag name spans through the block
> boundary
> -------------------------------------------------------------------------------------
>
> Key: PIG-3373
> URL: https://issues.apache.org/jira/browse/PIG-3373
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: site
> Reporter: Ahmed Eldawy
> Assignee: Ahmed Eldawy
> Labels: patch
> Attachments: PIG3373.patch, PIG3373_1.patch, bad-file.xml.bz2
>
>
> When node start tag spans two blocks this tag is returned even if it is not
> of the type.
> Example: For the following input file
> <event id="3423">
> <ev
> -------- BLOCK BOUNDARY
> entually id="dfasd">
> XMLoader with tag type 'event' should return only the first one but it
> actually returns both of them
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)