Scott Kuehn created SQOOP-2861:
----------------------------------
Summary: Sqoop2: Scheduler Pool Support
Key: SQOOP-2861
URL: https://issues.apache.org/jira/browse/SQOOP-2861
Project: Sqoop
Issue Type: New Feature
Components: sqoop2-framework
Affects Versions: 2.0.0
Reporter: Scott Kuehn
Provide a mechanism to limit cluster-wide sqoop access to a particular FROM
resource. The use case is to configure a yarn scheduler pool that will limit
the vcores and ram available for jobs accessing a sensitive resource. A subset
of sqoop2 jobs could be configured to run in this pool, whereas other sqoop2
jobs would fall back to the default pool configured for the sqoop2 server.
The throttling extractor mechanics are useful for preventing a single job from
saturating the resource, but this mechanism cannot limit aggregate resource
access across jobs. This ticket aims to enable the use of scheduler pools for
scenarios when multiple sqoop2 jobs would access a resource.
Possible implementation strategies:
# Enable clients to pass through job-specific mapreduce configuration, such as
key=value pairs in the CLI. A sqoop2 client would specify the scheduler pool by
passing a {{mapreduce.job.queuename}} from the CLI
# Expose scheduler semantics to the client. An execution engine can
subsequently decide to honor the scheduler request. For example, a pool
property can be interpreted and then set as the {{mapreduce.job.queuename}}
value of the hadoop configuration from the mapreduce execution engine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)