[GitHub] [hudi] chinmay-032 commented on issue #8625: [SUPPORT] Hudi partial updates not working with Spark batch job.

via GitHub Wed, 03 May 2023 23:12:20 -0700


chinmay-032 commented on issue #8625:
URL: https://github.com/apache/hudi/issues/8625#issuecomment-1534142841


   #### A small update: 
   
   I tried to run the exact code from my pyspark notebook as a spark-submit 
job. It worked fine and as expected with the partial updates. This leaves few 
places where there might be problem. The one difference in both is that in the 
working code (from notebook), we're creating dataframe using the 
pyspark.sql.Row api and in non-working code, we're casting a column with json 
strings as a dataframe by providing it a dynamically fetched schema. I have 
checked the intermediate dataframes being made in the non-working code and they 
are as expected. 
   
   Hope this new information helps. 
   
   _Note: There were some people suggesting that it might be version issue 
(since partial updates are supported in v0.13.0). However the Hudi version 
we're using is an amazon distribution, and the jars provided include the class 
PartialUpdateAvroPayload. Also, if that were the case, it wouldn't be working 
with the Row API as well. So I suspect it is not a version issue._ 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] chinmay-032 commented on issue #8625: [SUPPORT] Hudi partial updates not working with Spark batch job.

Reply via email to