All,

After we upgrade to the new release 0.4.7. One strange exception occurred when 
we read the com.uber.hoodie dataset from parquet.
This exception never occurred in the previous version. I am so appreciate if 
anyone can help me locate this exception.
Here I attach part of the exception log.

An exception or error caused a run to abort.
java.lang.ExceptionInInitializerError
               at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.buildReaderWithPartitionValues(ParquetFileFormat.scala:293)
               at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:285)
               at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:283)
               at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:303)
               at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
               at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
               at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
................

Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can 
not be empty. Parquet does not support empty group without leaves. Empty group: 
spark_schema
               at org.apache.parquet.schema.GroupType.<init>(GroupType.java:92)
               at org.apache.parquet.schema.GroupType.<init>(GroupType.java:48)
               at 
org.apache.parquet.schema.MessageType.<init>(MessageType.java:50)
               at 
org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256)

It seems that this exception cause by the schema of the dataframe write to the 
Hudi dataset. I careful compared the dataframe in our test case, the only 
different is the nullable field.
All test cases in Hudi test schema contains the true nullable field, however, 
some of my test cases contain false nullable field.
I tried to convert every nullable to true in our dataset fields, but it still 
contain the same exception.


Best regards

Yuanbin Cheng
CR/PJ-AI-S1


Reply via email to