May be that alters the order of classes in classpath? hard to tell.. We are definitely going to look into shading Jackson and few other dependencies in a much better way. 0.4.7 has a Jackson version change. So that could be a difference. But again, that version is same as what spark uses .. SO :( Let me try to reproduce your issue as well when we rethink the deps.
On Fri, May 31, 2019 at 4:54 PM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < [email protected]> wrote: > Hi, > > I am using Spark 2.2.0. I found that if I put the Hudi dependency under > the Spark dependency in Maven, Hudi can run correctly. > However, if I put Hudi before the Spark dependency, the exception always > occur, no matter hoodie-spark or hoodie-spark-bundle I used. > > Do you have any idea about the reason of this? This only happen in 0.4.7. > > Best regards > > Yuanbin Cheng > CR/PJ-AI-S1 > > > > -----Original Message----- > From: Vinoth Chandar <[email protected]> > Sent: Friday, May 31, 2019 1:19 AM > To: [email protected] > Subject: Re: Strange exception after upgrade to 0.4.7 > > Hi, > > This does sound like a jar mismatch issue from spark version. I have seen > similar ticket associated with spark 2.1.x IIRC. If you are building your > own uber/fat jar then probably better to depend on hoodie-spark module than > the hoodie-spark-bundle which is a uberjar itself. > > What version of spark are you using? > > Thanks > Vinoth > > On Thu, May 30, 2019 at 11:24 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < > [email protected]> wrote: > > > Hi, > > > > The test case is really very simple, just like the hoodie test case. > > I have two dataframe, using the CopyOnWrite, first write the first one > > with Overwrite, and then write the second one Append, both operation > > use the format "com.uber.hoodie". > > However, the exception occur when I read the dataset after this two > > write operation. > > I used the Maven to manage the dependencies, here is the part of my > > maven > > dependencies: > > > > <dependency> > > <groupId>com.uber.hoodie</groupId> > > <artifactId>hoodie-spark-bundle</artifactId> > > <version>0.4.7</version> > > </dependency> > > > > This exception only happen in 0.4.7, if I change it to 0.4.6, it works > > very well. > > I have ran the same test in > > 1. GitHub repository compiled on my laptop 2. Source Code of the 0.4.7 > > compiled on my laptop All worked very well. > > > > Maybe, it because of the Maven release. > > > > Best regards > > > > Yuanbin Cheng > > CR/PJ-AI-S1 > > > > > > > > -----Original Message----- > > From: Vinoth Chandar <[email protected]> > > Sent: Wednesday, May 29, 2019 8:00 PM > > To: [email protected] > > Subject: Re: Strange exception after upgrade to 0.4.7 > > > > Also curious if this error does not happen with 0.4.6? Can you please > > confirm that? It would be helpful to narrow it down > > > > On Wed, May 29, 2019 at 6:25 PM [email protected] > > <[email protected]> > > wrote: > > > > > Hi Yuanbin, > > > > > > Not sure if I completely understood the problem. Are you using > > > "com.uber.hoodie" format for reading the dataset ? Are you using > > > hoodie-spark-bundle ? > > > From the stack overflow link, > > > https://stackoverflow.com/questions/48034825/why-does-streaming-quer > > > y- > > > fail-with-invalidschemaexception-a-group-type-can-not?noredirect=1&l > > > q= > > > 1 , this could be because of parquet version. Assuming this is the > > > issue, I just checked spark-bundle and the parquet class > > > dependencies are all shaded. So, the new version of > > > hoodie-spark-bundle should not be a problem as such. Please make > > > sure you are only using hoodie-spark-bundle and no other hudi > > > packages are in classpath. Also, > > make sure if spark does not pull in the older version of parquet. > > > Balaji.V > > > > > > On Wednesday, May 29, 2019, 4:58:37 PM PDT, FIXED-TERM Cheng > > > Yuanbin > > > (CR/PJ-AI-S1) <[email protected]> wrote: > > > > > > All, > > > > > > After we upgrade to the new release 0.4.7. One strange exception > > > occurred when we read the com.uber.hoodie dataset from parquet. > > > This exception never occurred in the previous version. I am so > > > appreciate if anyone can help me locate this exception. > > > Here I attach part of the exception log. > > > > > > An exception or error caused a run to abort. > > > java.lang.ExceptionInInitializerError > > > at > > > > > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.b > > uildReaderWithPartitionValues(ParquetFileFormat.scala:293) > > > at > > > > > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute( > > DataSourceScanExec.scala:285) > > > at > > > > > org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceS > > canExec.scala:283) > > > at > > > > > org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSource > > ScanExec.scala:303) > > > at > > > > > org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(H > > ashAggregateExec.scala:141) > > > at > > > > > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeSt > > ageCodegenExec.scala:386) > > > at > > > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(Sp > > > ar > > > kPlan.scala:117) > > > ................ > > > > > > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group > > > type can not be empty. Parquet does not support empty group without > > leaves. > > > Empty group: spark_schema > > > at > > > org.apache.parquet.schema.GroupType.<init>(GroupType.java:92) > > > at > > > org.apache.parquet.schema.GroupType.<init>(GroupType.java:48) > > > at > > > org.apache.parquet.schema.MessageType.<init>(MessageType.java:50) > > > at > > > org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java: > > > 12 > > > 56) > > > > > > It seems that this exception cause by the schema of the dataframe > > > write to the Hudi dataset. I careful compared the dataframe in our > > > test case, the only different is the nullable field. > > > All test cases in Hudi test schema contains the true nullable field, > > > however, some of my test cases contain false nullable field. > > > I tried to convert every nullable to true in our dataset fields, but > > > it still contain the same exception. > > > > > > > > > Best regards > > > > > > Yuanbin Cheng > > > CR/PJ-AI-S1 > > > > > > > > >
