[ https://issues.apache.org/jira/browse/SQOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583729#comment-15583729 ]
Ruslan Dautkhanov commented on SQOOP-2907: ------------------------------------------ Any workarounds for this? Is any way we can generate from parquet schema a .metadata that KiteSDK/ Sqoop expects? I was trying to do # beeline 'create table avro_table stored as avro as select * from parquet_table where 1=0' # $ hadoop fs -get /hivewarehouse/avro_table/000000_0 ./ # $ avroavro-tools getschema /hivewarehouse/avro_table/000000_0 >000000_0.schema # $ kite-dataset -v create amf_trans -s 000000_0.schema The latest command of this four-steps process had finally .metadata directory, but once I tried to run sqoop export, got following exception: {noformat} 16/10/17 16:10:25 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetIOException: Unable to load descriptor file:hdfs://epsdatalake/hivewarehouse/disc_dv.db/amf_trans_dv_09142016/.metadata/descriptor.properties for dataset:amf_trans_dv_09142016 org.kitesdk.data.DatasetIOException: Unable to load descriptor file:hdfs://epsdatalake/hivewarehouse/disc_dv.db/amf_trans_dv_09142016/.metadata/descriptor.properties for dataset:amf_trans_dv_09142016 at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:127) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197) at org.kitesdk.data.Datasets.load(Datasets.java:108) at org.kitesdk.data.Datasets.load(Datasets.java:140) at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92) at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139) at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84) at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:424) at org.apache.sqoop.manager.oracle.OraOopConnManager.exportTable(OraOopConnManager.java:320) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) {noformat} \\ sqoop export's kiteSDK looks for *.metadata/descriptor.properties* file, but what kite-dataset utility generates has only *.metadata/schemas/1.asvc* The process has to be repeatable / scriptable that's why we were looking at different options to generate .metadata automatically, including using kite-dataset commands. It would be awesome if sqoop would generate .metadata that KiteSDK expects, if .metadata not found. > Export parquet files to RDBMS: don't require .metadata for parquet files > ------------------------------------------------------------------------ > > Key: SQOOP-2907 > URL: https://issues.apache.org/jira/browse/SQOOP-2907 > Project: Sqoop > Issue Type: Improvement > Components: metastore > Affects Versions: 1.4.6 > Environment: sqoop 1.4.6 > export parquet files to Oracle > Reporter: Ruslan Dautkhanov > > Kite currently requires .metadata. > Parquet files have their own metadata stored along data files. > It would be great for Export operation on parquet files to RDBMS not to > require .metadata. > We have most of the files created by Spark and Hive, and they don't create > .metadata, it only Kite that does. > It makes sqoop export of parquet files usability very limited. -- This message was sent by Atlassian JIRA (v6.3.4#6332)