[jira] [Updated] (SQOOP-2811) Sqoop2: Extracting sequence files may result in duplicates

Abraham Fine (JIRA) Fri, 29 Jan 2016 14:22:00 -0800

     [ 
https://issues.apache.org/jira/browse/SQOOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Abraham Fine updated SQOOP-2811:
--------------------------------
    Description: 
In the hdfs extractor we use:
{code:java}
    if (start > filereader.getPosition()) {
      filereader.sync(start); // sync to start
    }
{code}

to jump to the correct point in the sequence file that we want to extract.

If the sequence file is small, multiple start points may `sync` to the same 
point and we could end up extracting the same record multiple times.

  was:
In the hdfs extractor we use:
```
    if (start > filereader.getPosition()) {
      filereader.sync(start); // sync to start
    }
```

to jump to the correct point in the sequence file that we want to extract.

If the sequence file is small, multiple start points may `sync` to the same 
point and we could end up extracting the same record multiple times.


> Sqoop2: Extracting sequence files may result in duplicates
> ----------------------------------------------------------
>
>                 Key: SQOOP-2811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2811
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.99.6
>            Reporter: Abraham Fine
>            Assignee: Abraham Fine
>
> In the hdfs extractor we use:
> {code:java}
>     if (start > filereader.getPosition()) {
>       filereader.sync(start); // sync to start
>     }
> {code}
> to jump to the correct point in the sequence file that we want to extract.
> If the sequence file is small, multiple start points may `sync` to the same 
> point and we could end up extracting the same record multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SQOOP-2811) Sqoop2: Extracting sequence files may result in duplicates

Reply via email to