Hi,
The test case is really very simple, just like the hoodie test case.
I have two dataframe, using the CopyOnWrite, first write the first one with
Overwrite, and then write the second one Append, both operation use the format
"com.uber.hoodie".
However, the exception occur when I read the dataset after this two write
operation.
I used the Maven to manage the dependencies, here is the part of my maven
dependencies:
<dependency>
<groupId>com.uber.hoodie</groupId>
<artifactId>hoodie-spark-bundle</artifactId>
<version>0.4.7</version>
</dependency>
This exception only happen in 0.4.7, if I change it to 0.4.6, it works very
well.
I have ran the same test in
1. GitHub repository compiled on my laptop
2. Source Code of the 0.4.7 compiled on my laptop
All worked very well.
Maybe, it because of the Maven release.
Best regards
Yuanbin Cheng
CR/PJ-AI-S1
-----Original Message-----
From: Vinoth Chandar <[email protected]>
Sent: Wednesday, May 29, 2019 8:00 PM
To: [email protected]
Subject: Re: Strange exception after upgrade to 0.4.7
Also curious if this error does not happen with 0.4.6? Can you please confirm
that? It would be helpful to narrow it down
On Wed, May 29, 2019 at 6:25 PM [email protected] <[email protected]>
wrote:
> Hi Yuanbin,
>
> Not sure if I completely understood the problem. Are you using
> "com.uber.hoodie" format for reading the dataset ? Are you using
> hoodie-spark-bundle ?
> From the stack overflow link,
> https://stackoverflow.com/questions/48034825/why-does-streaming-query-
> fail-with-invalidschemaexception-a-group-type-can-not?noredirect=1&lq=
> 1 , this could be because of parquet version. Assuming this is the
> issue, I just checked spark-bundle and the parquet class dependencies
> are all shaded. So, the new version of hoodie-spark-bundle should not
> be a problem as such. Please make sure you are only using
> hoodie-spark-bundle and no other hudi packages are in classpath. Also, make
> sure if spark does not pull in the older version of parquet.
> Balaji.V
>
> On Wednesday, May 29, 2019, 4:58:37 PM PDT, FIXED-TERM Cheng
> Yuanbin
> (CR/PJ-AI-S1) <[email protected]> wrote:
>
> All,
>
> After we upgrade to the new release 0.4.7. One strange exception
> occurred when we read the com.uber.hoodie dataset from parquet.
> This exception never occurred in the previous version. I am so
> appreciate if anyone can help me locate this exception.
> Here I attach part of the exception log.
>
> An exception or error caused a run to abort.
> java.lang.ExceptionInInitializerError
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.buildReaderWithPartitionValues(ParquetFileFormat.scala:293)
> at
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:285)
> at
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:283)
> at
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:303)
> at
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(Spar
> kPlan.scala:117)
> ................
>
> Caused by: org.apache.parquet.schema.InvalidSchemaException: A group
> type can not be empty. Parquet does not support empty group without leaves.
> Empty group: spark_schema
> at
> org.apache.parquet.schema.GroupType.<init>(GroupType.java:92)
> at
> org.apache.parquet.schema.GroupType.<init>(GroupType.java:48)
> at
> org.apache.parquet.schema.MessageType.<init>(MessageType.java:50)
> at
> org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:12
> 56)
>
> It seems that this exception cause by the schema of the dataframe
> write to the Hudi dataset. I careful compared the dataframe in our
> test case, the only different is the nullable field.
> All test cases in Hudi test schema contains the true nullable field,
> however, some of my test cases contain false nullable field.
> I tried to convert every nullable to true in our dataset fields, but
> it still contain the same exception.
>
>
> Best regards
>
> Yuanbin Cheng
> CR/PJ-AI-S1
>
>