[GitHub] [hudi] matthiasdg commented on issue #3868: [SUPPORT] Querying hudi datasets from standalone metastore

GitBox Tue, 26 Oct 2021 07:12:13 -0700


matthiasdg commented on issue #3868:
URL: https://github.com/apache/hudi/issues/3868#issuecomment-951981890



   Meanwhile experimented with some other versions of hive metastore + mysql 
running in docker containers (e.g. 2.3.7 cf. spark). Same problems like the 
hive partition columns missing in the data:
   ```
   21/10/26 16:05:26 WARN HoodieFileIndex: Cannot do the partition prune for 
table 
abfss://[email protected]/devs/degeyt70/partitiontests/datalakehouse/vmm.aq_msm.The
 partitionFragments size (10893,2021,06,30) is not equal to the partition 
columns 
size(StructField(sensorId,LongType,false),StructField(timestamp,TimestampType,true))
   21/10/26 16:05:28 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 
15) 1]
   java.io.IOException: Required column is missing in data file. Col: [hiveid]
        at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initializeInternal(VectorizedParquetRecordReader.java:314)
        at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:154)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:329)
   ```
   Or is querying only supposed to work via jdbc?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] matthiasdg commented on issue #3868: [SUPPORT] Querying hudi datasets from standalone metastore

Reply via email to