[ https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957738#comment-16957738 ]
Pankaj Arora commented on SQOOP-2874: ------------------------------------- Is there any ETA for the issue? > Highlight Sqoop import with --as-parquetfile use cases (Dataset name <NAME> > is not alphanumeric (plus '_')) > ----------------------------------------------------------------------------------------------------------- > > Key: SQOOP-2874 > URL: https://issues.apache.org/jira/browse/SQOOP-2874 > Project: Sqoop > Issue Type: Improvement > Components: docs > Reporter: Markus Kemper > Assignee: Markus Kemper > Priority: Major > Attachments: Jira_SQOOP-2874_TestCases.txt > > > Hello Sqoop Community, > Would it be possible to request some documentation enhancements? > The ask is here is to proactively help raise awareness and improve user > experience with a few specific use cases [1] where some Sqoop commands have > restricted character options when using import with --as-parquetfile. > My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet > files. From the Kite documentation [3] we see that to ensure compatibility > (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces > which bubble up in Sqoop. > The following Sqoop use cases when using import with --as-parquetfile result > in the error [2] below. Full tests cases for each scenario are attached. If > it is an option to enhance the Sqoop documentation for these use cases I am > happy to provide proposed changes, let me know. > [1] Use Cases: > 1. sqoop import --as-parquetfile + --target-dir > /<path>/<rdbms_database>.<table> > 1.1. The '.' is not allowed > 2. sqoop import --as-parquetfile + --table <rdbms_database>.<table> + (no > --target-dir) > 2.1. The '.' is not allowed, this is essentially the same as (1) > 3. sqoop import --as-parquetfile + --hive-import --table > <hive_database>.<table> > 3.1. The proper usage is to use --hive-database with --hive-table however > with --as-textfile --hive-table works with <hive_database>.<table> > [2] Kite Error: > 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not > alphanumeric (plus '_') > org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not > alphanumeric (plus '_') > at > org.kitesdk.data.ValidationException.check(ValidationException.java:55) > at > org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) > at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) > at > org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) > at org.kitesdk.data.Datasets.create(Datasets.java:239) > at org.kitesdk.data.Datasets.create(Datasets.java:307) > at > org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141) > at > org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119) > at > org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130) > at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) > at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) > at > org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) > [3] Kite Documenation: > http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html > Names and Namespaces > URIs also define a name and namespace for your dataset. Kite uses these > values when the underlying system has the same concept (for example, Hive). > The name and namespace are typically the last two values in a URI. For > example, if you create a dataset using the URI > dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the > fact_tables Hive database. If you create a dataset using the URI > dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset > named ratings in the fact_tables namespace. To ensure compatibility with > Hive and other underlying systems, names and namespaces in URIs must be made > of alphanumeric or underscore (_) characters and cannot start with a number. > Thanks, Markus -- This message was sent by Atlassian Jira (v8.3.4#803005)