RE: Strange exception after upgrade to 0.4.7

FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) Thu, 30 May 2019 11:24:57 -0700

Hi,

The test case is really very simple, just like the hoodie test case.
I have two dataframe, using the CopyOnWrite, first write the first one with 
Overwrite, and then write the second one Append, both operation use the format 
"com.uber.hoodie".
However, the exception occur when I read the dataset after this two write 
operation.
I used the Maven to manage the dependencies, here is the part of my maven 
dependencies:


        <dependency>
            <groupId>com.uber.hoodie</groupId>
            <artifactId>hoodie-spark-bundle</artifactId>
            <version>0.4.7</version>
        </dependency>

This exception only happen in 0.4.7, if I change it to 0.4.6, it works very 
well.
I have ran the same test in 
1. GitHub repository compiled on my laptop
2. Source Code of the 0.4.7 compiled on my laptop
All worked very well.

Maybe, it because of the Maven release.

Best regards

Yuanbin Cheng
CR/PJ-AI-S1  



-----Original Message-----
From: Vinoth Chandar <[email protected]> 
Sent: Wednesday, May 29, 2019 8:00 PM
To: [email protected]
Subject: Re: Strange exception after upgrade to 0.4.7

Also curious if this error does not happen with 0.4.6? Can you please confirm 
that? It would be helpful to narrow it down

On Wed, May 29, 2019 at 6:25 PM [email protected] <[email protected]>
wrote:

>  Hi Yuanbin,
>
> Not sure if I completely understood the problem. Are you using 
> "com.uber.hoodie" format for reading the dataset ? Are you using 
> hoodie-spark-bundle ?
> From the stack overflow link,
> https://stackoverflow.com/questions/48034825/why-does-streaming-query-
> fail-with-invalidschemaexception-a-group-type-can-not?noredirect=1&lq=
> 1 , this could be because of parquet version. Assuming this is the 
> issue, I just checked spark-bundle and the parquet class dependencies 
> are all shaded.  So, the new version of hoodie-spark-bundle should not 
> be a problem as such.  Please make sure you are only using 
> hoodie-spark-bundle and no other hudi packages are in classpath. Also, make 
> sure if spark does not pull in the older version of parquet.
> Balaji.V
>
>     On Wednesday, May 29, 2019, 4:58:37 PM PDT, FIXED-TERM Cheng 
> Yuanbin
> (CR/PJ-AI-S1) <[email protected]> wrote:
>
>  All,
>
> After we upgrade to the new release 0.4.7. One strange exception 
> occurred when we read the com.uber.hoodie dataset from parquet.
> This exception never occurred in the previous version. I am so 
> appreciate if anyone can help me locate this exception.
> Here I attach part of the exception log.
>
> An exception or error caused a run to abort.
> java.lang.ExceptionInInitializerError
>               at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.buildReaderWithPartitionValues(ParquetFileFormat.scala:293)
>               at
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:285)
>               at
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:283)
>               at
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:303)
>               at
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
>               at
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:386)
>               at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(Spar
> kPlan.scala:117)
> ................
>
> Caused by: org.apache.parquet.schema.InvalidSchemaException: A group 
> type can not be empty. Parquet does not support empty group without leaves.
> Empty group: spark_schema
>               at
> org.apache.parquet.schema.GroupType.<init>(GroupType.java:92)
>               at
> org.apache.parquet.schema.GroupType.<init>(GroupType.java:48)
>               at
> org.apache.parquet.schema.MessageType.<init>(MessageType.java:50)
>               at
> org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:12
> 56)
>
> It seems that this exception cause by the schema of the dataframe 
> write to the Hudi dataset. I careful compared the dataframe in our 
> test case, the only different is the nullable field.
> All test cases in Hudi test schema contains the true nullable field, 
> however, some of my test cases contain false nullable field.
> I tried to convert every nullable to true in our dataset fields, but 
> it still contain the same exception.
>
>
> Best regards
>
> Yuanbin Cheng
> CR/PJ-AI-S1
>
>

RE: Strange exception after upgrade to 0.4.7

Reply via email to