[
https://issues.apache.org/jira/browse/SQOOP-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527532#comment-13527532
]
Jarek Jarcec Cecho commented on SQOOP-738:
------------------------------------------
I've continued my investigation and I believe that the problem is located in
this method of our own RecordWriter instance:
{code:title=org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor:83}
@Override
public void close(TaskAttemptContext context) throws InterruptedException {
LOG.info("Closing SqoopOutputFormat RecordWriter");
checkConsumerCompletion();
free.acquire();
writerFinished = true;
// This will interrupt only the acquire call in the consumer class,
// since we have acquired the free semaphore, and close is called from
// the same thread that writes - so filled has not been released since
then
// so the consumer is definitely blocked on the filled semaphore.
consumerFuture.cancel(true);
}
{code}
Contract of RecordWriter::close() method is to finish all writing (flush and
close all stuff) so that Hadoop can continue with committing results. I believe
that our implementation should wait on the reader thread to finish here in
order to fulfill the contract.
Jarcec
> Sqoop is not importing all data in Sqoop 2
> ------------------------------------------
>
> Key: SQOOP-738
> URL: https://issues.apache.org/jira/browse/SQOOP-738
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Priority: Blocker
> Fix For: 1.99.1
>
>
> I've tried to import exactly 408,957 (nice rounded number right?) rows in 10
> mappers and I've noticed that not all mappers will supply all the data all
> the time. For example in run I got 6 files with expected size of 10MB whereas
> the other 4 random files are completely empty. In another run I got 8 files
> of 10MB and just 2 files empty. I did not quite found any logic regarding how
> many and which files will end up empty. We definitely need to address this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira