[
https://issues.apache.org/jira/browse/MAPREDUCE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron Kimball resolved MAPREDUCE-1339.
--------------------------------------
Resolution: Duplicate
Sqoop has been removed from MapReduce; closing this issue. Also, Oracle
functionality has been improved in the mean time so as to obviate this bug.
> Sqoop full table import job times out when using the split-by attribute
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-1339
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/sqoop
> Affects Versions: 0.22.0
> Reporter: Leonid Furman
> Priority: Critical
> Fix For: 0.22.0
>
>
> Problem
> ------------
> When running sqoop command for full table import with split-by attribute
> specified, as follows:
> sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD
> --table TABLE_NAME --fields-terminated-by \\0x01 --as-textfile
> --warehouse-dir OUTPUT_DIR split-by RECORD_ID
> Sqoop is going to transform the split-by attribute to ORDER BY clause and run
> the following query in SQL (say, Oracle):
> SELECT * FROM TABLE_NAME ORDER BY RECORD_ID
> If the table has, for example, 20 million records, the ORDER BY part will
> increase the query running significantly, eventually causing time out, and
> resulting in no output written to Hadoop file system.
> Proposed solution
> -------------------------
> Not to append the ORDER_BY clause to SQL query if no where clause is
> specified.
> Can there be any issues with this solution?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.