abdkumar opened a new issue, #8614: URL: https://github.com/apache/hudi/issues/8614
**Description:** We are attempting to test Deltastreamer on local files and apply some transformations before writing the data into the destination table. However, we keep encountering a SQL-related error as mentioned in the title. I am running the HoodieDeltaStreamer with the SqlQueryBasedTransformer on Spark version 3.3.2 and Apache Hudi version 0.13.0. When running the Spark submit command, I am encountering the following error: **To Reproduce** Steps to reproduce the behavior: 1. Download data from https://drive.google.com/drive/folders/1BwNEK649hErbsWcYLZhqCWnaXFX3mIsg and utility package from https://mvnrepository.com/artifact/org.apache.hudi/hudi-utilities-bundle_2.12/0.13.0 2. create required folders 3. running spark-submit command `spark-submit \ --conf spark.jars=/home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \ --conf spark.sql.hive.convertMetastoreParquet=false \ --conf mapreduce.fileoutputcommitter.marksuccessfuljobs=false \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar \ --enable-sync \ --source-ordering-field replicadmstimestamp \ --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ --target-table invoice \ --target-base-path /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data \ --table-type COPY_ON_WRITE \ --op UPSERT \ --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \ --hoodie-conf hoodie.datasource.write.recordkey.field=invoiceid \ --hoodie-conf hoodie.deltastreamer.source.dfs.root=file:////home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files \ --hoodie-conf hoodie.datasource.write.precombine.field=replicadmstimestamp \ --hoodie-conf hoodie.database.name=metastore \ --hoodie-conf hoodie.datasource.hive_sync.enable=true \ --hoodie-conf hoodie.datasource.hive_sync.table=invoice \ --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ --hoodie-conf hoodie.deltastreamer.transformer.sql="SELECT * ,extract(year from replicadmstimestamp) as year, extract(month from replicadmstimestamp) as month, extract(day from replicadmstimestamp) as day FROM <SRC> a;" \ --hoodie-conf hoodie.datasource.write.partitionpath.field=year,month,day \ --hoodie-conf hoodie.datasource.hive_sync.partition_fields=year,month,day \ ` **Environment Description** * Hudi version : 0.13 * Spark version : 3.3.2 * Hive version : 3.1.3 * Hadoop version : 3.3.2 * Storage (HDFS/S3/GCS..) : Local * Running on Docker? (yes/no) : no **Additional context** Added required jars in spark-defaults.conf file `spark.jars.packages io.delta:delta-core_2.12:2.3.0,org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,org.apache.spark:spark-avro_2.12:3.3.2` **Stacktrace** ```Warning: Ignoring non-Spark config property: mapreduce.fileoutputcommitter.marksuccessfuljobs :: loading settings :: url = jar:file:/home/kumar/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml Ivy Default Cache set to: /home/kumar/.ivy2/cache The jars for the packages stored in: /home/kumar/.ivy2/jars io.delta#delta-core_2.12 added as a dependency org.apache.hudi#hudi-spark3.3-bundle_2.12 added as a dependency org.apache.spark#spark-avro_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-91f79469-e32d-4f8c-8dc0-3be403ce410d;1.0 confs: [default] found io.delta#delta-core_2.12;2.3.0 in central found io.delta#delta-storage;2.3.0 in central found org.antlr#antlr4-runtime;4.8 in central found org.apache.hudi#hudi-spark3.3-bundle_2.12;0.13.0 in central found org.apache.spark#spark-avro_2.12;3.3.2 in central found org.tukaani#xz;1.9 in central found org.spark-project.spark#unused;1.0.0 in central :: resolution report :: resolve 616ms :: artifacts dl 29ms :: modules in use: io.delta#delta-core_2.12;2.3.0 from central in [default] io.delta#delta-storage;2.3.0 from central in [default] org.antlr#antlr4-runtime;4.8 from central in [default] org.apache.hudi#hudi-spark3.3-bundle_2.12;0.13.0 from central in [default] org.apache.spark#spark-avro_2.12;3.3.2 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] org.tukaani#xz;1.9 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 7 | 0 | 0 | 0 || 7 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-91f79469-e32d-4f8c-8dc0-3be403ce410d confs: [default] 0 artifacts copied, 7 already retrieved (0kB/23ms) 23/05/01 13:23:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/05/01 13:23:10 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs 23/05/01 13:23:10 INFO SparkContext: Running Spark version 3.3.2 23/05/01 13:23:10 INFO ResourceUtils: ============================================================== 23/05/01 13:23:10 INFO ResourceUtils: No custom resources configured for spark.driver. 23/05/01 13:23:10 INFO ResourceUtils: ============================================================== 23/05/01 13:23:10 INFO SparkContext: Submitted application: delta-streamer-invoice 23/05/01 13:23:11 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/05/01 13:23:11 INFO ResourceProfile: Limiting resource is cpu 23/05/01 13:23:11 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/05/01 13:23:11 INFO SecurityManager: Changing view acls to: kumar 23/05/01 13:23:11 INFO SecurityManager: Changing modify acls to: kumar 23/05/01 13:23:11 INFO SecurityManager: Changing view acls groups to: 23/05/01 13:23:11 INFO SecurityManager: Changing modify acls groups to: 23/05/01 13:23:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kumar); groups with view permissions: Set(); users with modify permissions: Set(kumar); groups with modify permissions: Set() 23/05/01 13:23:11 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 23/05/01 13:23:11 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 23/05/01 13:23:11 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type 23/05/01 13:23:11 INFO Utils: Successfully started service 'sparkDriver' on port 43729. 23/05/01 13:23:11 INFO SparkEnv: Registering MapOutputTracker 23/05/01 13:23:11 INFO SparkEnv: Registering BlockManagerMaster 23/05/01 13:23:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/05/01 13:23:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/05/01 13:23:12 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/05/01 13:23:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-227cf5db-3cef-4bff-9c4b-17f81aa8aa1b 23/05/01 13:23:12 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB 23/05/01 13:23:12 INFO SparkEnv: Registering OutputCommitCoordinator 23/05/01 13:23:12 INFO Utils: Successfully started service 'SparkUI' on port 8090. 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/io.delta_delta-core_2.12-2.3.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-core_2.12-2.3.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/io.delta_delta-storage-2.3.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-storage-2.3.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.antlr_antlr4-runtime-4.8.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.tukaani_xz-1.9.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.tukaani_xz-1.9.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: Added JAR file:///home/kumar/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO SparkContext: The JAR file:/home/kumar/hudi-utilities-bundle_2.12-0.13.0.jar at spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar has been added already. Overwriting of added jar is not supported in the current version. 23/05/01 13:23:12 INFO Executor: Starting executor ID driver on host hudi-vm.internal.cloudapp.net 23/05/01 13:23:12 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): '' 23/05/01 13:23:12 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-storage-2.3.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO TransportClientFactory: Successfully created connection to hudi-vm.internal.cloudapp.net/10.1.0.4:43729 after 49 ms (0 ms spent in bootstraps) 23/05/01 13:23:12 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-storage-2.3.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp6875321353190243857.tmp 23/05/01 13:23:12 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/io.delta_delta-storage-2.3.0.jar to class loader 23/05/01 13:23:12 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar with timestamp 1682947390924 23/05/01 13:23:12 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp7911131019630374440.tmp 23/05/01 13:23:14 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.0.jar to class loader 23/05/01 13:23:14 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar with timestamp 1682947390924 23/05/01 13:23:14 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/hudi-utilities-bundle_2.12-0.13.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp3321144361109611039.tmp 23/05/01 13:23:16 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/hudi-utilities-bundle_2.12-0.13.0.jar to class loader 23/05/01 13:23:16 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-core_2.12-2.3.0.jar with timestamp 1682947390924 23/05/01 13:23:16 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/io.delta_delta-core_2.12-2.3.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp738084884534076528.tmp 23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/io.delta_delta-core_2.12-2.3.0.jar to class loader 23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar with timestamp 1682947390924 23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.apache.spark_spark-avro_2.12-3.3.2.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp6464200383284256273.tmp 23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.apache.spark_spark-avro_2.12-3.3.2.jar to class loader 23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.antlr_antlr4-runtime-4.8.jar with timestamp 1682947390924 23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.antlr_antlr4-runtime-4.8.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp5710292961021659934.tmp 23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.antlr_antlr4-runtime-4.8.jar to class loader 23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.tukaani_xz-1.9.jar with timestamp 1682947390924 23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.tukaani_xz-1.9.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp5943086360298064670.tmp 23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.tukaani_xz-1.9.jar to class loader 23/05/01 13:23:17 INFO Executor: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1682947390924 23/05/01 13:23:17 INFO Utils: Fetching spark://hudi-vm.internal.cloudapp.net:43729/jars/org.spark-project.spark_unused-1.0.0.jar to /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/fetchFileTemp2308988093976298818.tmp 23/05/01 13:23:17 INFO Executor: Adding file:/tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135/userFiles-84300702-dcb6-46f6-aba7-b2af5f66e946/org.spark-project.spark_unused-1.0.0.jar to class loader 23/05/01 13:23:17 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40465. 23/05/01 13:23:17 INFO NettyBlockTransferService: Server created on hudi-vm.internal.cloudapp.net:40465 23/05/01 13:23:17 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/05/01 13:23:17 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None) 23/05/01 13:23:17 INFO BlockManagerMasterEndpoint: Registering block manager hudi-vm.internal.cloudapp.net:40465 with 366.3 MiB RAM, BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None) 23/05/01 13:23:17 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None) 23/05/01 13:23:17 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, hudi-vm.internal.cloudapp.net, 40465, None) 23/05/01 13:23:18 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 23/05/01 13:23:18 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file 23/05/01 13:23:18 INFO UtilHelpers: Adding overridden properties to file properties. 23/05/01 13:23:18 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. 23/05/01 13:23:18 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data 23/05/01 13:23:18 INFO HoodieTableConfig: Loading table properties from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data/.hoodie/hoodie.properties 23/05/01 13:23:18 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data 23/05/01 13:23:18 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 23/05/01 13:23:18 INFO SharedState: Warehouse path is 'file:/home/kumar/hudi_project/deltastreamer_file_transformer/spark-warehouse'. 23/05/01 13:23:21 INFO HoodieDeltaStreamer: Creating delta streamer with configs: hoodie.auto.adjust.lock.configs: true hoodie.database.name: metastore hoodie.datasource.hive_sync.enable: true hoodie.datasource.hive_sync.partition_fields: year,month,day hoodie.datasource.hive_sync.table: invoice hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.ComplexKeyGenerator hoodie.datasource.write.partitionpath.field: year,month,day hoodie.datasource.write.precombine.field: replicadmstimestamp hoodie.datasource.write.reconcile.schema: false hoodie.datasource.write.recordkey.field: invoiceid hoodie.deltastreamer.source.dfs.root: [file:////home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files](file://home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files) hoodie.deltastreamer.transformer.sql: SELECT * ,extract(year from replicadmstimestamp) as year, extract(month from replicadmstimestamp) as month, extract(day from replicadmstimestamp) as day FROM <SRC> a; 23/05/01 13:23:21 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data 23/05/01 13:23:21 INFO HoodieTableConfig: Loading table properties from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data/.hoodie/hoodie.properties 23/05/01 13:23:21 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data 23/05/01 13:23:21 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty 23/05/01 13:23:21 INFO DFSPathSelector: Using path selector org.apache.hudi.utilities.sources.helpers.DFSPathSelector 23/05/01 13:23:21 INFO HoodieDeltaStreamer: Delta Streamer running only single round 23/05/01 13:23:21 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data 23/05/01 13:23:21 INFO HoodieTableConfig: Loading table properties from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data/.hoodie/hoodie.properties 23/05/01 13:23:21 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /home/kumar/hudi_project/deltastreamer_file_transformer/transformed_data 23/05/01 13:23:21 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty 23/05/01 13:23:21 INFO DeltaSync: Checkpoint to resume from : Optional.empty 23/05/01 13:23:21 INFO DFSPathSelector: Root path => [file:////home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files](file://home/kumar/hudi_project/deltastreamer_file_transformer/sample_data_files) source limit => 9223372036854775807 23/05/01 13:23:21 INFO InMemoryFileIndex: It took 64 ms to list leaf files for 2 paths. 23/05/01 13:23:23 INFO SparkContext: Starting job: parquet at ParquetDFSSource.java:55 23/05/01 13:23:23 INFO DAGScheduler: Got job 0 (parquet at ParquetDFSSource.java:55) with 1 output partitions 23/05/01 13:23:23 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at ParquetDFSSource.java:55) 23/05/01 13:23:23 INFO DAGScheduler: Parents of final stage: List() 23/05/01 13:23:23 INFO DAGScheduler: Missing parents: List() 23/05/01 13:23:23 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at ParquetDFSSource.java:55), which has no missing parents 23/05/01 13:23:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 106.0 KiB, free 366.2 MiB) 23/05/01 13:23:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 38.2 KiB, free 366.2 MiB) 23/05/01 13:23:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hudi-vm.internal.cloudapp.net:40465 (size: 38.2 KiB, free: 366.3 MiB) 23/05/01 13:23:23 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1513 23/05/01 13:23:24 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at ParquetDFSSource.java:55) (first 15 tasks are for partitions Vector(0)) 23/05/01 13:23:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 23/05/01 13:23:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (hudi-vm.internal.cloudapp.net, executor driver, partition 0, PROCESS_LOCAL, 4449 bytes) taskResourceAssignments Map() 23/05/01 13:23:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 23/05/01 13:23:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1606 bytes result sent to driver 23/05/01 13:23:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1696 ms on hudi-vm.internal.cloudapp.net (executor driver) (1/1) 23/05/01 13:23:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 23/05/01 13:23:25 INFO DAGScheduler: ResultStage 0 (parquet at ParquetDFSSource.java:55) finished in 2.676 s 23/05/01 13:23:25 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 23/05/01 13:23:25 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 23/05/01 13:23:25 INFO DAGScheduler: Job 0 finished: parquet at ParquetDFSSource.java:55, took 2.834363 s 23/05/01 13:23:26 INFO BlockManagerInfo: Removed broadcast_0_piece0 on hudi-vm.internal.cloudapp.net:40465 in memory (size: 38.2 KiB, free: 366.3 MiB) 23/05/01 13:23:30 INFO SqlQueryBasedTransformer: Registering tmp table : HOODIE_SRC_TMP_TABLE_a1962c9c_edde_45ce_88db_a73c519c000d 23/05/01 13:23:30 INFO DeltaSync: Shutting down embedded timeline server 23/05/01 13:23:30 INFO HoodieDeltaStreamer: Shut down delta streamer 23/05/01 13:23:30 INFO SparkUI: Stopped Spark web UI at http://hudi-vm.internal.cloudapp.net:8090/ 23/05/01 13:23:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/05/01 13:23:31 INFO MemoryStore: MemoryStore cleared 23/05/01 13:23:31 INFO BlockManager: BlockManager stopped 23/05/01 13:23:31 INFO BlockManagerMaster: BlockManagerMaster stopped 23/05/01 13:23:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/05/01 13:23:31 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(Lscala/PartialFunction;)Lorg/apache/spark/sql/catalyst/plans/logical/LogicalPlan; at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:42) at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:40) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179) at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118) at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151) at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204) at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249) at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:92) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:3887) at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3455) at org.apache.hudi.utilities.transform.SqlQueryBasedTransformer.apply(SqlQueryBasedTransformer.java:63) at org.apache.hudi.utilities.transform.ChainedTransformer.apply(ChainedTransformer.java:50) at org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$fetchFromSource$0(DeltaSync.java:495) at org.apache.hudi.common.util.Option.map(Option.java:108) at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:495) at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:206) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:204) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 23/05/01 13:23:31 INFO ShutdownHookManager: Shutdown hook called 23/05/01 13:23:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-5d07761f-de34-404e-ba00-73a56a32d135 23/05/01 13:23:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-7c0ada38-e349-47fb-bf8d-9f7c288f4e24``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
