Scott Kuehn created SQOOP-2861:
----------------------------------

             Summary: Sqoop2: Scheduler Pool Support
                 Key: SQOOP-2861
                 URL: https://issues.apache.org/jira/browse/SQOOP-2861
             Project: Sqoop
          Issue Type: New Feature
          Components: sqoop2-framework
    Affects Versions: 2.0.0
            Reporter: Scott Kuehn


Provide a mechanism to limit cluster-wide sqoop access to a particular FROM 
resource. The use case is to configure a yarn scheduler pool that will limit 
the vcores and ram available for jobs accessing a sensitive resource. A subset 
of sqoop2 jobs could be configured to run in this pool, whereas other sqoop2 
jobs would fall back to the default pool configured for the sqoop2 server.

The throttling extractor mechanics are useful for preventing a single job from 
saturating the resource, but this mechanism cannot limit aggregate resource 
access across jobs. This ticket aims to enable the use of scheduler pools for 
scenarios when multiple sqoop2 jobs would access a resource.

Possible implementation strategies:
# Enable clients to pass through job-specific mapreduce configuration, such as 
key=value pairs in the CLI. A sqoop2 client would specify the scheduler pool by 
passing a {{mapreduce.job.queuename}} from the CLI 
# Expose scheduler semantics to the client. An execution engine can 
subsequently decide to honor the scheduler request. For example, a pool 
property can be interpreted and then set as the {{mapreduce.job.queuename}} 
value of the hadoop configuration from the mapreduce execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to