[GitHub] [incubator-dolphinscheduler] Eights-Li opened a new issue #2917: [Feature] Sqoop component optimization

GitBox Sat, 06 Jun 2020 09:26:19 -0700


Eights-Li opened a new issue #2917:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/2917



   **Is your feature request related to a problem? Please describe.**
   dev branch sqoop task need to enhancment.
   optimization points:
   Sqoop's data access and data export do not support Hadoop-level custom 
parameters, that is, -D level parameters
           – MR task name
           – MR map and reduce memory and quantity, etc.
       • Split-by field is not supported. If -m is greater than 1, if the 
primary key of the relational database table is not self-increasing, Sqoop It 
may cause duplicate data imported into Hadoop. The general solution is to 
specify a split-by field. therefore, split-by needs support
       • Cannot customize parameters, such as import mysql, some tables can add 
–direct to speed up the import speed
   
   **Describe the solution you'd like**
   ideas:
    • The task name of Sqoop is universal, and it must be changed to the 
required parameter on the Sqoop page
       • Add Hadoop custom parameter input box for setting MR parameter memory, 
etc.
       • Add Sqoop task-level custom parameters, like –driect, –fetch-size and 
other parameters used in specific situations
       • Add option button to choose, custom script or use template script, 
refer to the design of DataX node
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-dolphinscheduler] Eights-Li opened a new issue #2917: [Feature] Sqoop component optimization

Reply via email to