Re: [I] Concurrent and Roll back commit issue [hudi]

via GitHub Mon, 18 Dec 2023 19:54:51 -0800


khajaasmath786 commented on issue #10356:
URL: https://github.com/apache/hudi/issues/10356#issuecomment-1862075740


   I will try this and see it .
   from pyspark.sql import SparkSession
   from pyspark.sql.functions import col
   
   # Initialize Spark Session
   spark = SparkSession.builder \
       .appName("Hudi Rollback") \
       .config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer") \
       .getOrCreate()
   
   # Set the base path for the Hudi dataset
   basePath = "<your-hudi-table-base-path>"
   
   # Load the Hudi dataset
   hudi_df = spark.read.format("hudi").load(basePath)
   
   # Display commit times
   commit_times = 
hudi_df.select("_hoodie_commit_time").distinct().orderBy("_hoodie_commit_time").collect()
   print("Commit times in the dataset:")
   for commit in commit_times:
       print(commit["_hoodie_commit_time"])
   
   # Specify the commit time you want to roll back to
   target_commit_time = "20231214220739609"
   
   # Identify commits newer than the target commit
   newer_commits = [commit["_hoodie_commit_time"] for commit in commit_times if 
commit["_hoodie_commit_time"] > target_commit_time]
   
   # Rollback newer commits in reverse order
   for commit in reversed(newer_commits):
       print(f"Rolling back commit: {commit}")
       # Perform the rollback
       # This is a placeholder, replace with actual Hudi rollback command
       # spark.sql(f"CALL hudi_rollback('{commit}')")
       # Note: The actual rollback command may vary based on Hudi version and 
setup
   
   spark.stop()
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Concurrent and Roll back commit issue [hudi]

Reply via email to