[ 
https://issues.apache.org/jira/browse/SQOOP-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527532#comment-13527532
 ] 

Jarek Jarcec Cecho commented on SQOOP-738:
------------------------------------------

I've continued my investigation and I believe that the problem is located in 
this method of our own RecordWriter instance:

{code:title=org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor:83}
   @Override
    public void close(TaskAttemptContext context) throws InterruptedException {
      LOG.info("Closing SqoopOutputFormat RecordWriter");
      checkConsumerCompletion();
      free.acquire();
      writerFinished = true;
      // This will interrupt only the acquire call in the consumer class,
      // since we have acquired the free semaphore, and close is called from
      // the same thread that writes - so filled has not been released since 
then
      // so the consumer is definitely blocked on the filled semaphore.
      consumerFuture.cancel(true);
    }
{code}

Contract of RecordWriter::close() method is to finish all writing (flush and 
close all stuff) so that Hadoop can continue with committing results. I believe 
that our implementation should wait on the reader thread to finish here in 
order to fulfill the contract.

Jarcec
                
> Sqoop is not importing all data in Sqoop 2
> ------------------------------------------
>
>                 Key: SQOOP-738
>                 URL: https://issues.apache.org/jira/browse/SQOOP-738
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Blocker
>             Fix For: 1.99.1
>
>
> I've tried to import exactly 408,957 (nice rounded number right?) rows in 10 
> mappers and I've noticed that not all mappers will supply all the data all 
> the time. For example in run I got 6 files with expected size of 10MB whereas 
> the other 4 random files are completely empty. In another run I got 8 files 
> of 10MB and just 2 files empty. I did not quite found any logic regarding how 
> many and which files will end up empty. We definitely need to address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to