All,
After we upgrade to the new release 0.4.7. One strange exception occurred when
we read the com.uber.hoodie dataset from parquet.
This exception never occurred in the previous version. I am so appreciate if
anyone can help me locate this exception.
Here I attach part of the exception log.
An exception or error caused a run to abort.
java.lang.ExceptionInInitializerError
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.buildReaderWithPartitionValues(ParquetFileFormat.scala:293)
at
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:285)
at
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:283)
at
org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:303)
at
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
at
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
................
Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can
not be empty. Parquet does not support empty group without leaves. Empty group:
spark_schema
at org.apache.parquet.schema.GroupType.<init>(GroupType.java:92)
at org.apache.parquet.schema.GroupType.<init>(GroupType.java:48)
at
org.apache.parquet.schema.MessageType.<init>(MessageType.java:50)
at
org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256)
It seems that this exception cause by the schema of the dataframe write to the
Hudi dataset. I careful compared the dataframe in our test case, the only
different is the nullable field.
All test cases in Hudi test schema contains the true nullable field, however,
some of my test cases contain false nullable field.
I tried to convert every nullable to true in our dataset fields, but it still
contain the same exception.
Best regards
Yuanbin Cheng
CR/PJ-AI-S1