chinmay-032 commented on issue #8625: URL: https://github.com/apache/hudi/issues/8625#issuecomment-1534142841
#### A small update: I tried to run the exact code from my pyspark notebook as a spark-submit job. It worked fine and as expected with the partial updates. This leaves few places where there might be problem. The one difference in both is that in the working code (from notebook), we're creating dataframe using the pyspark.sql.Row api and in non-working code, we're casting a column with json strings as a dataframe by providing it a dynamically fetched schema. I have checked the intermediate dataframes being made in the non-working code and they are as expected. Hope this new information helps. _Note: There were some people suggesting that it might be version issue (since partial updates are supported in v0.13.0). However the Hudi version we're using is an amazon distribution, and the jars provided include the class PartialUpdateAvroPayload. Also, if that were the case, it wouldn't be working with the Row API as well. So I suspect it is not a version issue._ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
