[ 
https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957738#comment-16957738
 ] 

Pankaj Arora commented on SQOOP-2874:
-------------------------------------

Is there any ETA for the issue?

> Highlight Sqoop import with --as-parquetfile use cases (Dataset name <NAME> 
> is not alphanumeric (plus '_'))
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-2874
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2874
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: docs
>            Reporter: Markus Kemper
>            Assignee: Markus Kemper
>            Priority: Major
>         Attachments: Jira_SQOOP-2874_TestCases.txt
>
>
> Hello Sqoop Community,
> Would it be possible to request some documentation enhancements?
> The ask is here is to proactively help raise awareness and improve user 
> experience with a few specific use cases [1] where some Sqoop commands have 
> restricted character options when using import with --as-parquetfile.  
> My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet 
> files.  From the Kite documentation [3] we see that to ensure compatibility 
> (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces 
> which bubble up in Sqoop.
> The following Sqoop use cases when using import with --as-parquetfile result 
> in the error [2] below.  Full tests cases for each scenario are attached.  If 
> it is an option to enhance the Sqoop documentation for these use cases I am 
> happy to provide proposed changes, let me know.
> [1] Use Cases:
> 1. sqoop import --as-parquetfile + --target-dir 
> /<path>/<rdbms_database>.<table>
> 1.1. The '.' is not allowed
> 2. sqoop import --as-parquetfile + --table <rdbms_database>.<table>  + (no 
> --target-dir)
> 2.1. The '.' is not allowed, this is essentially the same as (1)
> 3. sqoop import --as-parquetfile + --hive-import --table 
> <hive_database>.<table> 
> 3.1. The proper usage is to use --hive-database with --hive-table however 
> with --as-textfile --hive-table works with <hive_database>.<table>
> [2] Kite Error:
> 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
> alphanumeric (plus '_')
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not 
> alphanumeric (plus '_')
>       at 
> org.kitesdk.data.ValidationException.check(ValidationException.java:55)
>       at 
> org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
>       at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
>       at 
> org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
>       at 
> org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
>       at org.kitesdk.data.Datasets.create(Datasets.java:239)
>       at org.kitesdk.data.Datasets.create(Datasets.java:307)
>       at 
> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141)
>       at 
> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119)
>       at 
> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
>       at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>       at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>       at 
> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
>       at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>       at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>       at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>       at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>       at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>       at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> [3] Kite Documenation:
> http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html
> Names and Namespaces
> URIs also define a name and namespace for your dataset. Kite uses these 
> values when the underlying system has the same concept (for example, Hive). 
> The name and namespace are typically the last two values in a URI. For 
> example, if you create a dataset using the URI 
> dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the 
> fact_tables Hive database. If you create a dataset using the URI 
> dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset 
> named ratings in the fact_tables namespace.  To ensure compatibility with 
> Hive and other underlying systems, names and namespaces in URIs must be made 
> of alphanumeric or underscore (_) characters and cannot start with a number.
> Thanks, Markus



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to