rangareddy commented on issue #14889:
URL: https://github.com/apache/hudi/issues/14889#issuecomment-3664318714

   We have implemented the 
[HoodieRepairTool](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieRepairTool.java),
 a Spark-based utility designed to repair Hudi tables by identifying and 
removing dangling base and log files.
   
   **Example Command:**
   
   ```sh
   spark-submit \
   --class org.apache.hudi.utilities.HoodieRepairTool \
   --driver-memory 4g \
   --executor-memory 1g \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --conf spark.sql.catalogImplementation=hive \
   --conf 
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
   
$HUDI_DIR/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.11.0-SNAPSHOT.jar
 \
   --mode repair \
   --base-path base_path \
   --backup-path backup_path \
   --start-instant-time ts1 \
   --end-instant-time ts2 \
   --assume-date-partitioning
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to