nsivabalan commented on issue #10803: URL: https://github.com/apache/hudi/issues/10803#issuecomment-1979906166
Hey, I wrote a tool that could help us spit out some meta info about our log blocks and records. https://github.com/nsivabalan/hudi/tree/printAllVersionsOfRecordTool here is the branch. Can you help us run the tool and share us the output. Its a spark submit command. Its going to log some info about the log files we are interested in. sample command ``` ./bin/spark-submit --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --class org.apache.hudi.utilities.PrintRecordsTool PATH_TO_BUNDLE/hudi-utilities-bundle_2.12-0.15.0-SNAPSHOT.jar --props /tmp/props.in --base-path /tmp/hudi_trips_mor/ --partition-path asia/india/chennai --file-id c3ef010f-61ae-4aa3-a033-25b278da17c6-0 --base-instant-time 20240302002723362 --print-log-blocks-info ``` ``` cat /tmp/props.in hoodie.datasource.write.recordkey.field=uuid hoodie.datasource.write.partitionpath.field=partitionpath hoodie.datasource.write.precombine.field=ts ``` Ensure you set the right values for partition path, fileID and the base instant time. This should help w/ our triaging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
