[ 
https://issues.apache.org/jira/browse/SQOOP-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated SQOOP-1606:
--------------------------------
    Description: 
Pretty rare, but can happen (that is - did happen...):

Assume a table, TABLE1 with an index, also named TABLE1.

In that case, the input splits generated by getOracleDataChunksExtent will 
include block ranges that belong to the index and can overlap with some of the 
"correct" block ranges. This can lead to duplicate data when importing.

The solution should be to use object_type when filtering to limit ourselves to 
tables and partitions.

  was:
Pretty rare, but can happen (that is - did happen...):

Assume a table, TABLE1 with an index, also named TABLE1.
The segments for TABLE1 table and TABLE1 index are in two different 
tablespaces, and both have identical relative_fno.

In that case, the input splits generated by getOracleDataChunksExtent will 
include block ranges that belong to the index and can overlap with some of the 
"correct" block ranges. This can lead to duplicate data when importing.

The solution should be to use object_type when filtering to limit ourselves to 
tables and partitions.


> Oraoop import can end up with overlapping input splits, generating duplicate 
> data
> ---------------------------------------------------------------------------------
>
>                 Key: SQOOP-1606
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1606
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Gwen Shapira
>            Assignee: Gwen Shapira
>
> Pretty rare, but can happen (that is - did happen...):
> Assume a table, TABLE1 with an index, also named TABLE1.
> In that case, the input splits generated by getOracleDataChunksExtent will 
> include block ranges that belong to the index and can overlap with some of 
> the "correct" block ranges. This can lead to duplicate data when importing.
> The solution should be to use object_type when filtering to limit ourselves 
> to tables and partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to