> On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > Good work Gwen! Couple of high level notes: > > > > 1) Please always put the patch on JIRA as well, we do have pre-commit build > > that will test your changes. Also we can't commit your changes unless they > > are attached to JIRA. > > 2) The loader tool will load entire repository dump into memory which seems > > fine for now, but we might need to think about file format that would > > enable us process the dump in streaming fashion in the future.
2) I seriously seriously doubt we'll ever run into an issue here. Even large Sqoop users have around 50 jobs, we can easily have a year of daily submissions in less than 0.5G RAM. But in the unlikely case we'll need it - Jackson has streaming JSON Parser. The only part that may not fit into memory is the "submissions" records, and we can load them one-by-one easily. > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java, line 21 > > <https://reviews.apache.org/r/21898/diff/6/?file=635434#file635434line21> > > > > I don't think that interface is the correct type here, what about final > > class? Makes sense. Since most of those JSON constants are used all over the code, I'm wondering if it makes sense to collect all of them into a class in org.apache.sqoop.json.util? (As a separate Jira) > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java, > > lines 143-147 > > <https://reviews.apache.org/r/21898/diff/6/?file=635435#file635435line143> > > > > Wouldn't be cleaner to just use ConnectorManager.getConnector API to > > get the connector name? > > > > > > https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L135 It looks like there are two interfaces to get connector metadata - one in ConnectorManager and one in RepositoryManager. I was using the RepositoryManager everywhere here and didn't notice that ConnectorManager actually provides better API. I'm wondering if there's a good reason for the redundancy or whether we should clean this up. > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, > > lines 137-144 > > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line137> > > > > What about creating another parent tool that will initialize entire > > Sqoop infrastructure? Something like the ConfiguredTool - we should be able > > to use the same tool for dump and load. I'm not sure its a good idea - all the initialize methods take parameters, which may be different for different tools. In this case I'm initializing RepositoryManager as immutable in both dump and load tools, but it should probably be mutable in load. > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, > > line 231 > > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line231> > > > > I'm thinking if this check is indeed necessary. User might want to load > > older backup to a different Sqoop version right? Users may indeed want to (although it shouldn't be common - backup best practices include a fresh backup immediately following a successful upgrade), it should work (we don't plan on making backward incompatible changes) and its trivial to work around this check for any sufficiently determined user (or support). The reason its there is that otherwise we are implicitly promising that dump and load will work between any two versions. I doubt we want to promise (and test!) that. If you think its important, I can check that the repository is newer than the json. Importing new dump into an old repo is much more likely to break (new fields in forms). > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, > > line 243 > > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line243> > > > > Similar update code is already in the code base, so I'm wondering if it > > would make sense to abstract and reuse it rather then have similar code on > > two places? > > > > > > https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/repository/Repository.java#L386 Lets do it in a separate Jira for refactoring? I don't want to add changes to the Repository into the scope of this patch. > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java, line 18 > > <https://reviews.apache.org/r/21898/diff/6/?file=635434#file635434line18> > > > > I can't help by my OCD is complaining about the location of the file - > > it's in generic package, but only two tools are using it. What about > > creating sub-package "repository" and put repository related tools there? See my comment below. I agree this isn't a good place, but I think the right place is outside Tools completely, so same constants can be used by clients, and other parts of the framework that need to work with JSON. > On July 19, 2014, 9:06 p.m., Jarek Cecho wrote: > > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java, > > lines 361-366 > > <https://reviews.apache.org/r/21898/diff/6/?file=635436#file635436line361> > > > > Seems like good use case to use ConnectorManager.getConnector() API? > > > > > > https://github.com/apache/sqoop/blob/sqoop2/core/src/main/java/org/apache/sqoop/connector/ConnectorManager.java#L140 It would be, if it was possible. ConnectorManager.getConnector returns a connector. I need the connector PersistenceID, which is part of the connector metadata. ConnectorManager.getConnectorMetadata takes an ID as a parameter, but not a name. The repository API has the same issue. I'll open a separate JIRA for the cleanup. - Gwen ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21898/#review48179 ----------------------------------------------------------- On July 18, 2014, 4:11 p.m., Gwen Shapira wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/21898/ > ----------------------------------------------------------- > > (Updated July 18, 2014, 4:11 p.m.) > > > Review request for Sqoop. > > > Repository: sqoop-sqoop2 > > > Description > ------- > > Added tool for dumping user-generated data - connections, jobs and > submissions. There's an option to dump sensitive data (i.e. passwords) as > well. > > > Diffs > ----- > > docs/src/site/sphinx/Tools.rst ad72cd1 > pom.xml 1e2f005 > tools/pom.xml 31eda1c > tools/src/main/java/org/apache/sqoop/tools/tool/BuiltinTools.java b24cb35 > tools/src/main/java/org/apache/sqoop/tools/tool/JSONConstants.java > PRE-CREATION > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryDumpTool.java > PRE-CREATION > tools/src/main/java/org/apache/sqoop/tools/tool/RepositoryLoadTool.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/21898/diff/ > > > Testing > ------- > > Manual testing. Dumping repository with and without sensitive data. > Validating resulting JSON. > > > Thanks, > > Gwen Shapira > >
