Jarek Jarcec Cecho created SQOOP-2579:
-----------------------------------------
Summary: Sqoop2: Refactore RepositoryManager to not load structure
from repository when loading jobs and links
Key: SQOOP-2579
URL: https://issues.apache.org/jira/browse/SQOOP-2579
Project: Sqoop
Issue Type: Bug
Affects Versions: 1.99.6
Reporter: Jarek Jarcec Cecho
This one will take a bit longer to explain, so please _beer_ with me :)
The current flow of configuration class is this one:
* Connector developer defines class and marks inputs that are required from
users with
[{{@Input}}|https://github.com/apache/sqoop/blob/sqoop2/common/src/main/java/org/apache/sqoop/model/Input.java]
annotation. This annotation can have several parameters ({{sensitive}},
{{size}}, ...).
* As we can't use those classes everywhere (mainly we can't send them to
clients over the wire nor serialize them to repository), we have
[{{ConfigUtil}}|https://github.com/apache/sqoop/blob/sqoop2/common/src/main/java/org/apache/sqoop/model/ConfigUtils.java]
class that serializes them to [model
objects|https://github.com/apache/sqoop/tree/sqoop2/common/src/main/java/org/apache/sqoop/model].
* Now {{ConnectorManager}} takes the model objects and persist them in memory
for the lifetime of Sqoop 2 server process (after restart we simply run the
{{ConfigUtils}} again, so nothing is lost).
* The {{RepositoryManager}} takes the model objects and persist them in
database. This is desirable because this way we can detect if the structures
actually changed (e.g. connector code has been updated) and we can call upgrade
routines.
On REST layer, we have following behavior:
* For {{/connector/}} we always call
[{{ConnectorManager}}|https://github.com/apache/sqoop/blob/sqoop2/server/src/main/java/org/apache/sqoop/handler/ConnectorRequestHandler.java]
to get the in memory structures.
* For {{/job}} and {{/links}} - e.g. to get actual values for those model
objects - we are using
[{{RepositoryManager}}|https://github.com/apache/sqoop/blob/sqoop2/server/src/main/java/org/apache/sqoop/handler/JobRequestHandler.java]
instead. As the {{RepositoryManager}} is currently not using
{{ConnectorManager}} it uses the model objects that are serialized in the
database instead.
This behavior is a bit limiting because now if we want to improve the model
classes and add new information
([Validations|https://issues.apache.org/jira/browse/SQOOP-1442], [Sensitive
keys for maps|https://issues.apache.org/jira/browse/SQOOP-2549]) we absolutely
have to serialize that information in database. And that means that we have to
do database upgrade, that means that if this information is changed we have to
do data upgrade, ... Even though that this is not required as those kind of
things can be kept in memory as we don't need to do any action when they change.
Hence I would like to propose changing current behavior and making
{{RepositoryManager}} use {{ConnectroManager}} to read jobs and links objects.
This way we can have more in memory state and what state is persisted in
database.
Please note that I'm not suggesting to drop the model classes from repository -
we still need them to detect if metadata (the {{@Input}} annotation) has
changed and run upgrade if required. I'm only suggesting to not used this
serialized metadata if we are reading job and links objects.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)