Dovy Paukstys created SQOOP-3445: ------------------------------------ Summary: Spark with Sqoop and Kite - Parquet Mismatch in Command? Key: SQOOP-3445 URL: https://issues.apache.org/jira/browse/SQOOP-3445 Project: Sqoop Issue Type: Bug Components: sqoop2-kite-connector Affects Versions: 1.4.7 Environment: System: * Debian 9 * Hadoop 2.9 * Spark 2.3
Installed Dependencies (JARs): * sqoop-1.4.7-hadoop260 * kite-data-mapreduce-1.1.0 * kite-hadoop-compatibility-1.1.0.jar * kite-data-crunch-1.1.0 * kite-data-core-1.1.0 * avro-tools-1.8.2.jar * mysql-connector-java-5.1.42 * parquet-tools-1.8.3 Reporter: Dovy Paukstys Not sure if the error is deep in scoop or if the error is in Kite, so I cross-posted here: [https://github.com/kite-sdk/kite/issues/490]. I am reading from a MySQL Database and trying to write out to parquet. When writing to Avro there are no issues, but when Kite is involved (parquet) all hell breaks loose. First I had to manually add a ton of jar's to even get the sucker to run. But that all seems resolved. Also, please note, I have tried various versions of the installed dependencies, downgrading and upgrading scoop accordingly. When Sqoop is used without Kite (IE, Avro, not parquet) there are no issues. The moment the job runs to export to parquet, everything blows up. It seems like Kite may be the offender, but it may be in the scoop code for how Kite is run. System: * Debian 9 * Hadoop 2.9 * Spark 2.3 Installed Dependencies (JARs): * sqoop-1.4.7-hadoop260 * kite-data-mapreduce-1.1.0 * kite-hadoop-compatibility-1.1.0.jar * kite-data-crunch-1.1.0 * kite-data-core-1.1.0 * avro-tools-1.8.2.jar * mysql-connector-java-5.1.42 * parquet-tools-1.8.3 Error: {code:java} 19/07/09 17:55:28 INFO mapreduce.Job: Job job_1562682312457_0020 failed with state FAILED due to: Job setup failed : java.lang.IllegalArgumentException: Parquet only supports generic and specific data models, type parameter must implement IndexedRecord at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:96) at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128) at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199) at org.kitesdk.data.Datasets.load(Datasets.java:108) at org.kitesdk.data.Datasets.load(Datasets.java:165) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:542) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:569) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/07/09 17:55:28 INFO mapreduce.Job: Counters: 2{code} Again, it only fails on the final conversion. I am not sure of the full details since the command is inside a parallel process. Any direction would be appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)