matthiasdg commented on issue #3868: URL: https://github.com/apache/hudi/issues/3868#issuecomment-951981890
Meanwhile experimented with some other versions of hive metastore + mysql running in docker containers (e.g. 2.3.7 cf. spark). Same problems like the hive partition columns missing in the data: ``` 21/10/26 16:05:26 WARN HoodieFileIndex: Cannot do the partition prune for table abfss://[email protected]/devs/degeyt70/partitiontests/datalakehouse/vmm.aq_msm.The partitionFragments size (10893,2021,06,30) is not equal to the partition columns size(StructField(sensorId,LongType,false),StructField(timestamp,TimestampType,true)) 21/10/26 16:05:28 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 15) 1] java.io.IOException: Required column is missing in data file. Col: [hiveid] at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initializeInternal(VectorizedParquetRecordReader.java:314) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:154) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:329) ``` Or is querying only supposed to work via jdbc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
