> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > Good work Gwen! Couple of high level notes:
> > 
> > 1) Please always put the patch on JIRA as well, we do have pre-commit build 
> > that will test your changes. Also we can't commit your changes unless they 
> > are attached to JIRA.
> > 2) The loader tool will load entire repository dump into memory which seems 
> > fine for now, but we might need to think about file format that would 
> > enable us process the dump in streaming fashion in the future.

2) I seriously seriously doubt we'll ever run into an issue here. Even large 
Sqoop users have around 50 jobs, we can easily have a year of daily submissions 
in less than 0.5G RAM. 
But in the unlikely case we'll need it - Jackson has streaming JSON Parser. The 
only part that may not fit into memory is the "submissions" records, and we can 
load them one-by-one easily. 


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java, line 21
> > <https://reviews.apache.org/r/21898/diff/6/?file=635434#file635434line21>
> >
> >     I don't think that interface is the correct type here, what about final 
> > class?

Makes sense. 

Since most of those JSON constants are used all over the code, I'm wondering if 
it makes sense to collect all of them into a class in 
org.apache.sqoop.json.util?

(As a separate Jira)


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java, 
> > lines 143-147
> > <https://reviews.apache.org/r/21898/diff/6/?file=635435#file635435line143>
> >
> >     Wouldn't be cleaner to just use ConnectorManager.getConnector API to 
> > get the connector name?
> >     
> >     
> > https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L135

It looks like there are two interfaces to get connector metadata - one in 
ConnectorManager and one in RepositoryManager. I was using the 
RepositoryManager everywhere here and didn't notice that ConnectorManager 
actually provides better API.

I'm wondering if there's a good reason for the redundancy or whether we should 
clean this up.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, 
> > lines 137-144
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line137>
> >
> >     What about creating another parent tool that will initialize entire 
> > Sqoop infrastructure? Something like the ConfiguredTool - we should be able 
> > to use the same tool for dump and load.

I'm not sure its a good idea - all the initialize methods take parameters, 
which may be different for different tools.

In this case I'm initializing RepositoryManager as immutable in both dump and 
load tools, but it should probably be mutable in load.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, 
> > line 231
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line231>
> >
> >     I'm thinking if this check is indeed necessary. User might want to load 
> > older backup to a different Sqoop version right?

Users may indeed want to (although it shouldn't be common - backup best 
practices include a fresh backup immediately following a successful upgrade), 
it should work (we don't plan on making backward incompatible changes) and its 
trivial to work around this check for any sufficiently determined user (or 
support).

The reason its there is that otherwise we are implicitly promising that dump 
and load will work between any two versions. I doubt we want to promise (and 
test!) that.

If you think its important, I can check that the repository is newer than the 
json. Importing new dump into an old repo is much more likely to break (new 
fields in forms).


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, 
> > line 243
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line243>
> >
> >     Similar update code is already in the code base, so I'm wondering if it 
> > would make sense to abstract and reuse it rather then have similar code on 
> > two places?
> >     
> >     
> > https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/repository/Repository.java#L386

Lets do it in a separate Jira for refactoring? 
I don't want to add changes to the Repository into the scope of this patch.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java, line 18
> > <https://reviews.apache.org/r/21898/diff/6/?file=635434#file635434line18>
> >
> >     I can't help by my OCD is complaining about the location of the file - 
> > it's in generic package, but only two tools are using it. What about 
> > creating sub-package "repository" and put repository related tools there?

See my comment below. I agree this isn't a good place, but I think the right 
place is outside Tools completely, so same constants can be used by clients, 
and other parts of the framework that need to work with JSON.


> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote:
> > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, 
> > lines 361-366
> > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line361>
> >
> >     Seems like good use case to use ConnectorManager.getConnector() API?
> >     
> >     
> > https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L140

It would be, if it was possible. 

ConnectorManager.getConnector returns a connector. I need the connector 
PersistenceID, which is part of the connector metadata.
ConnectorManager.getConnectorMetadata takes an ID as a parameter, but not a 
name. The repository API has the same issue.

I'll open a separate JIRA for the cleanup. 


- Gwen


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21898/#review48179
-----------------------------------------------------------


On July 18, 2014, 4:11 p.m., Gwen Shapira wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21898/
> -----------------------------------------------------------
> 
> (Updated July 18, 2014, 4:11 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Added tool for dumping user-generated data - connections, jobs and 
> submissions. There's an option to dump sensitive data (i.e. passwords) as 
> well. 
> 
> 
> Diffs
> -----
> 
>   docs/src/site/sphinx/Tools.rst ad72cd1 
>   pom.xml 1e2f005 
>   tools/pom.xml 31eda1c 
>   tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 
>   tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java 
> PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java 
> PRE-CREATION 
>   tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21898/diff/
> 
> 
> Testing
> -------
> 
> Manual testing. Dumping repository with and without sensitive data. 
> Validating resulting JSON.
> 
> 
> Thanks,
> 
> Gwen Shapira
> 
>

Reply via email to