khajaasmath786 commented on issue #10356:
URL: https://github.com/apache/hudi/issues/10356#issuecomment-1862075740
I will try this and see it .
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
# Initialize Spark Session
spark = SparkSession.builder \
.appName("Hudi Rollback") \
.config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
.getOrCreate()
# Set the base path for the Hudi dataset
basePath = "<your-hudi-table-base-path>"
# Load the Hudi dataset
hudi_df = spark.read.format("hudi").load(basePath)
# Display commit times
commit_times =
hudi_df.select("_hoodie_commit_time").distinct().orderBy("_hoodie_commit_time").collect()
print("Commit times in the dataset:")
for commit in commit_times:
print(commit["_hoodie_commit_time"])
# Specify the commit time you want to roll back to
target_commit_time = "20231214220739609"
# Identify commits newer than the target commit
newer_commits = [commit["_hoodie_commit_time"] for commit in commit_times if
commit["_hoodie_commit_time"] > target_commit_time]
# Rollback newer commits in reverse order
for commit in reversed(newer_commits):
print(f"Rolling back commit: {commit}")
# Perform the rollback
# This is a placeholder, replace with actual Hudi rollback command
# spark.sql(f"CALL hudi_rollback('{commit}')")
# Note: The actual rollback command may vary based on Hudi version and
setup
spark.stop()
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]