[
https://issues.apache.org/jira/browse/SQOOP-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507060#comment-13507060
]
Cheolsoo Park commented on SQOOP-721:
-------------------------------------
+1.
I diff'ed {{CombineFileInputFormat.java}} from Sqoop and Hadoop-2.0.x and
confirmed that there is one change as follows:
{code}
154c160,163
< return codec instanceof SplittableCompressionCodec;
---
>
> // Once we remove support for Hadoop < 2.0
> //return codec instanceof SplittableCompressionCodec;
> return false;
{code}
As far as I understand, the only impact of this difference is that the
compressed files won't be split even though they're splitable, which doesn't
have any impact on correctness while it does on performance.
I didn't run any tests with this patch, but given that the patch is identical
to what's committed in MAPREDUCE-1597, I think that it is fine. Please let me
know if anyone has any concerns.
Thanks!
> Duplicating rows on export when exporting from compressed files.
> ----------------------------------------------------------------
>
> Key: SQOOP-721
> URL: https://issues.apache.org/jira/browse/SQOOP-721
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.2
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Priority: Blocker
> Attachments: bugSQOOP-721.patch, bugSQOOP-721.patch
>
>
> It appears that in some situations export will duplicate rows. It seems that
> this behavior is happening when user is exporting compressed files that are
> "big enough".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira