[ 
https://issues.apache.org/jira/browse/SQOOP-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507060#comment-13507060
 ] 

Cheolsoo Park commented on SQOOP-721:
-------------------------------------

+1.

I diff'ed {{CombineFileInputFormat.java}} from Sqoop and Hadoop-2.0.x and 
confirmed that there is one change as follows:
{code}
154c160,163
<     return codec instanceof SplittableCompressionCodec;
---
> 
>     // Once we remove support for Hadoop < 2.0
>     //return codec instanceof SplittableCompressionCodec;
>     return false;
{code}
As far as I understand, the only impact of this difference is that the 
compressed files won't be split even though they're splitable, which doesn't 
have any impact on correctness while it does on performance.

I didn't run any tests with this patch, but given that the patch is identical 
to what's committed in MAPREDUCE-1597, I think that it is fine. Please let me 
know if anyone has any concerns.

Thanks!
                
> Duplicating rows on export when exporting from compressed files.
> ----------------------------------------------------------------
>
>                 Key: SQOOP-721
>                 URL: https://issues.apache.org/jira/browse/SQOOP-721
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Blocker
>         Attachments: bugSQOOP-721.patch, bugSQOOP-721.patch
>
>
> It appears that in some situations export will duplicate rows. It seems that 
> this behavior is happening when user is exporting compressed files that are 
> "big enough".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to