Jarek Jarcec Cecho created SQOOP-2579:
-----------------------------------------

             Summary: Sqoop2: Refactore RepositoryManager to not load structure 
from repository when loading jobs and links
                 Key: SQOOP-2579
                 URL: https://issues.apache.org/jira/browse/SQOOP-2579
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.99.6
            Reporter: Jarek Jarcec Cecho


This one will take a bit longer to explain, so please _beer_ with me :)

The current flow of configuration class is this one:

* Connector developer defines class and marks inputs that are required from 
users with 
[{{@Input}}|https://github.com/apache/sqoop/blob/sqoop2/common/src/main/java/org/apache/sqoop/model/Input.java]
 annotation. This annotation can have several parameters ({{sensitive}}, 
{{size}}, ...).
* As we can't use those classes everywhere (mainly we can't send them to 
clients over the wire nor serialize them to repository), we have 
[{{ConfigUtil}}|https://github.com/apache/sqoop/blob/sqoop2/common/src/main/java/org/apache/sqoop/model/ConfigUtils.java]
 class that serializes them to [model 
objects|https://github.com/apache/sqoop/tree/sqoop2/common/src/main/java/org/apache/sqoop/model].
* Now {{ConnectorManager}} takes the model objects and persist them in memory 
for the lifetime of Sqoop 2 server process (after restart we simply run the 
{{ConfigUtils}} again, so nothing is lost).
* The {{RepositoryManager}} takes the model objects and persist them in 
database. This is desirable because this way we can detect if the structures 
actually changed (e.g. connector code has been updated) and we can call upgrade 
routines.

On REST layer, we have following behavior:
* For {{/connector/}} we always call 
[{{ConnectorManager}}|https://github.com/apache/sqoop/blob/sqoop2/server/src/main/java/org/apache/sqoop/handler/ConnectorRequestHandler.java]
 to get the in memory structures.
* For {{/job}} and {{/links}} - e.g. to get actual values for those model 
objects - we are using 
[{{RepositoryManager}}|https://github.com/apache/sqoop/blob/sqoop2/server/src/main/java/org/apache/sqoop/handler/JobRequestHandler.java]
 instead. As the {{RepositoryManager}} is currently not using 
{{ConnectorManager}} it uses the model objects that are serialized in the 
database instead.

This behavior is a bit limiting because now if we want to improve the model 
classes and add new information 
([Validations|https://issues.apache.org/jira/browse/SQOOP-1442], [Sensitive 
keys for maps|https://issues.apache.org/jira/browse/SQOOP-2549]) we absolutely 
have to serialize that information in database. And that means that we have to 
do database upgrade, that means that if this information is changed we have to 
do data upgrade, ... Even though that this is not required as those kind of 
things can be kept in memory as we don't need to do any action when they change.

Hence I would like to propose changing current behavior and making 
{{RepositoryManager}} use {{ConnectroManager}} to read jobs and links objects. 
This way we can have more in memory state and what state is persisted in 
database.

Please note that I'm not suggesting to drop the model classes from repository - 
we still need them to detect if metadata (the {{@Input}} annotation) has 
changed and run upgrade if required. I'm only suggesting to not used this 
serialized metadata if we are reading job and links objects.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to