Re: Strange exception after upgrade to 0.4.7

Vinoth Chandar Sat, 01 Jun 2019 07:58:57 -0700

May be that alters the order of classes in classpath? hard to tell..  We
are definitely going to look into shading Jackson and few other
dependencies in a much better way.
0.4.7 has a Jackson version change. So that could be a difference. But
again, that version is same as what spark uses ..
SO :( Let me try to reproduce your issue as well when we rethink the deps.


On Fri, May 31, 2019 at 4:54 PM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) <
[email protected]> wrote:

> Hi,
>
> I am using Spark 2.2.0. I found that if I put the Hudi dependency under
> the Spark dependency in Maven, Hudi can run correctly.
> However, if I put Hudi before the Spark dependency, the exception always
> occur, no matter hoodie-spark or hoodie-spark-bundle I used.
>
> Do you have any idea about the reason of this? This only happen in 0.4.7.
>
> Best regards
>
> Yuanbin Cheng
> CR/PJ-AI-S1
>
>
>
> -----Original Message-----
> From: Vinoth Chandar <[email protected]>
> Sent: Friday, May 31, 2019 1:19 AM
> To: [email protected]
> Subject: Re: Strange exception after upgrade to 0.4.7
>
> Hi,
>
> This does sound like a jar mismatch issue from spark version. I have seen
> similar ticket associated with spark 2.1.x IIRC. If you are building your
> own uber/fat jar then probably better to depend on hoodie-spark module than
> the hoodie-spark-bundle which is a uberjar itself.
>
> What version of spark are you using?
>
> Thanks
> Vinoth
>
> On Thu, May 30, 2019 at 11:24 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) <
> [email protected]> wrote:
>
> > Hi,
> >
> > The test case is really very simple, just like the hoodie test case.
> > I have two dataframe, using the CopyOnWrite, first write the first one
> > with Overwrite, and then write the second one Append, both operation
> > use the format "com.uber.hoodie".
> > However, the exception occur when I read the dataset after this two
> > write operation.
> > I used the Maven to manage the dependencies, here is the part of my
> > maven
> > dependencies:
> >
> >         <dependency>
> >             <groupId>com.uber.hoodie</groupId>
> >             <artifactId>hoodie-spark-bundle</artifactId>
> >             <version>0.4.7</version>
> >         </dependency>
> >
> > This exception only happen in 0.4.7, if I change it to 0.4.6, it works
> > very well.
> > I have ran the same test in
> > 1. GitHub repository compiled on my laptop 2. Source Code of the 0.4.7
> > compiled on my laptop All worked very well.
> >
> > Maybe, it because of the Maven release.
> >
> > Best regards
> >
> > Yuanbin Cheng
> > CR/PJ-AI-S1
> >
> >
> >
> > -----Original Message-----
> > From: Vinoth Chandar <[email protected]>
> > Sent: Wednesday, May 29, 2019 8:00 PM
> > To: [email protected]
> > Subject: Re: Strange exception after upgrade to 0.4.7
> >
> > Also curious if this error does not happen with 0.4.6? Can you please
> > confirm that? It would be helpful to narrow it down
> >
> > On Wed, May 29, 2019 at 6:25 PM [email protected]
> > <[email protected]>
> > wrote:
> >
> > >  Hi Yuanbin,
> > >
> > > Not sure if I completely understood the problem. Are you using
> > > "com.uber.hoodie" format for reading the dataset ? Are you using
> > > hoodie-spark-bundle ?
> > > From the stack overflow link,
> > > https://stackoverflow.com/questions/48034825/why-does-streaming-quer
> > > y-
> > > fail-with-invalidschemaexception-a-group-type-can-not?noredirect=1&l
> > > q=
> > > 1 , this could be because of parquet version. Assuming this is the
> > > issue, I just checked spark-bundle and the parquet class
> > > dependencies are all shaded.  So, the new version of
> > > hoodie-spark-bundle should not be a problem as such.  Please make
> > > sure you are only using hoodie-spark-bundle and no other hudi
> > > packages are in classpath. Also,
> > make sure if spark does not pull in the older version of parquet.
> > > Balaji.V
> > >
> > >     On Wednesday, May 29, 2019, 4:58:37 PM PDT, FIXED-TERM Cheng
> > > Yuanbin
> > > (CR/PJ-AI-S1) <[email protected]> wrote:
> > >
> > >  All,
> > >
> > > After we upgrade to the new release 0.4.7. One strange exception
> > > occurred when we read the com.uber.hoodie dataset from parquet.
> > > This exception never occurred in the previous version. I am so
> > > appreciate if anyone can help me locate this exception.
> > > Here I attach part of the exception log.
> > >
> > > An exception or error caused a run to abort.
> > > java.lang.ExceptionInInitializerError
> > >               at
> > >
> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.b
> > uildReaderWithPartitionValues(ParquetFileFormat.scala:293)
> > >               at
> > >
> > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(
> > DataSourceScanExec.scala:285)
> > >               at
> > >
> > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceS
> > canExec.scala:283)
> > >               at
> > >
> > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSource
> > ScanExec.scala:303)
> > >               at
> > >
> > org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(H
> > ashAggregateExec.scala:141)
> > >               at
> > >
> > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeSt
> > ageCodegenExec.scala:386)
> > >               at
> > > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(Sp
> > > ar
> > > kPlan.scala:117)
> > > ................
> > >
> > > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group
> > > type can not be empty. Parquet does not support empty group without
> > leaves.
> > > Empty group: spark_schema
> > >               at
> > > org.apache.parquet.schema.GroupType.<init>(GroupType.java:92)
> > >               at
> > > org.apache.parquet.schema.GroupType.<init>(GroupType.java:48)
> > >               at
> > > org.apache.parquet.schema.MessageType.<init>(MessageType.java:50)
> > >               at
> > > org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:
> > > 12
> > > 56)
> > >
> > > It seems that this exception cause by the schema of the dataframe
> > > write to the Hudi dataset. I careful compared the dataframe in our
> > > test case, the only different is the nullable field.
> > > All test cases in Hudi test schema contains the true nullable field,
> > > however, some of my test cases contain false nullable field.
> > > I tried to convert every nullable to true in our dataset fields, but
> > > it still contain the same exception.
> > >
> > >
> > > Best regards
> > >
> > > Yuanbin Cheng
> > > CR/PJ-AI-S1
> > >
> > >
> >
>

Re: Strange exception after upgrade to 0.4.7

Reply via email to