[ 
https://issues.apache.org/jira/browse/MAPREDUCE-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-907:
------------------------------------

    Attachment: MAPREDUCE-907.patch

Attaching a new patch which makes use of data-driven splits from MAPREDUCE-885. 
This allows most databases to properly scan separate ranges of a table in 
parallel leading to much better performance.

Some notable changes:

* The {{\-\-order-by}} parameter has been renamed to {{\-\-split-by}}. Entries 
are no longer strictly ordered, eliminating a database scalability chokepoint.
** TestOrderBy has been renamed to TestSplitBy
* With data-driven splits, multiple mappers make sense again. This adds a 
{{\-\-num-mappers}} / {{\-m}} parameter to control the degree of parallelism in 
reading.
* DataDrivenDBInputFormat is currently incompatible with Oracle. Oracle still 
uses the old DBInputFormat-based import path.

> Sqoop should use more intelligent splits
> ----------------------------------------
>
>                 Key: MAPREDUCE-907
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-907
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-907.patch
>
>
> Sqoop should use the new split generation / InputFormat in MAPREDUCE-885

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to