[
https://issues.apache.org/jira/browse/SQOOP-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jarek Jarcec Cecho updated SQOOP-931:
-------------------------------------
Hi [~venkatnrangan],
thank you very much for working on this! I've read the proposal and I do have
couple of notes:
* Can we introduce {{\--hcatalog-database}} parameter similarly as we added for
Hive in SQOOP-912? I do have two main reasons for that - having the database
inside the table parameter is inconsistent with the rest of the framework
({{\--table}} will not accept database) which is very confusing for the end
user and it unfortunately won't work for {{import-all-tables}} tool.
* I do understand the reasoning with using HCatalog as a nice way to get
support for all additional output types. However I feel that forcing user to
create the table definition prior import is against the idea of Sqoop and might
be show stopper for most of the users. The most obvious advantage of using
Sqoop is that it can propagate all the metadata for user automatically. I would
strongly prefer to have option like {{--create-hcatalog-table}} from day one.
* I would like to understand what are the possible implications of not
supporting {{--drop-hive-import-delims}}. The reason why we've introduced this
parameter is that we were creating "shadows" rows as rows containing new line
character were spitted into two lines and thus were seen as two rows by Hive. I
assume that this is not an issue for all output types supported by HCatalog
(Avro, SequenceFile), but what about normal text files?
* Not supporting {{--direct}} option is a bummer, but I guess that we can live
with it. Would be possible in such case to just import data using usual means
and load them into HCatalog similarly as we're doing in Hive? I do understand
that this will limit our option of reusing HCatalog SerDe's in this case.
* It seems that we're proposing only manual tests that requires setup of third
party dependencies. As this is very significant feature, I would argue for
having normal tests that will be running during usual jenkins builds. Can we
reuse some sort of MiniHCatalogCluster like for Hadoop/Mr/Hive case?
Jarcec
> Integrate HCatalog with Sqoop
> -----------------------------
>
> Key: SQOOP-931
> URL: https://issues.apache.org/jira/browse/SQOOP-931
> Project: Sqoop
> Issue Type: New Feature
> Affects Versions: 1.4.2, 1.4.3
> Environment: All 1.x sqoop version
> Reporter: Venkat Ranganathan
> Assignee: Venkat Ranganathan
> Attachments: SQOOP-931.patch, SQOOP HCatalog Integration.pdf
>
>
> Apache HCatalog is a table and storage management service that provides a
> shared schema, data types and table abstraction freeing users from being
> concerned about where or how their data is stored. It provides
> interoperability across Pig, Map Reduce, and Hive.
> A sqoop hcatalog connector will help in supporting storage formats that are
> abstracted by HCatalog.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira