[
https://issues.apache.org/jira/browse/SQOOP-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abraham Fine updated SQOOP-2811:
--------------------------------
Description:
In the hdfs extractor we use:
{code:java}
if (start > filereader.getPosition()) {
filereader.sync(start); // sync to start
}
{code}
to jump to the correct point in the sequence file that we want to extract.
If the sequence file is small, multiple start points may `sync` to the same
point and we could end up extracting the same record multiple times.
was:
In the hdfs extractor we use:
```
if (start > filereader.getPosition()) {
filereader.sync(start); // sync to start
}
```
to jump to the correct point in the sequence file that we want to extract.
If the sequence file is small, multiple start points may `sync` to the same
point and we could end up extracting the same record multiple times.
> Sqoop2: Extracting sequence files may result in duplicates
> ----------------------------------------------------------
>
> Key: SQOOP-2811
> URL: https://issues.apache.org/jira/browse/SQOOP-2811
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.99.6
> Reporter: Abraham Fine
> Assignee: Abraham Fine
>
> In the hdfs extractor we use:
> {code:java}
> if (start > filereader.getPosition()) {
> filereader.sync(start); // sync to start
> }
> {code}
> to jump to the correct point in the sequence file that we want to extract.
> If the sequence file is small, multiple start points may `sync` to the same
> point and we could end up extracting the same record multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)