tooptoop4 opened a new issue #2392:
URL: https://github.com/apache/hudi/issues/2392


   **Describe the problem you faced**
   
   Under one of hudi table path on S3 i have over 30000 files with 
.commits_.archive. in the name
   ie (just listed a few below)
   ```
   mytablepath/.hoodie/.commits_.archive.31370_1-0-1
   mytablepath/.hoodie/.commits_.archive.31371_1-0-1
   mytablepath/.hoodie/.commits_.archive.31372_1-0-1
   mytablepath/.hoodie/.commits_.archive.31373_1-0-1
   mytablepath/.hoodie/.commits_.archive.31374_1-0-1
   mytablepath/.hoodie/.commits_.archive.31375_1-0-1
   mytablepath/.hoodie/.commits_.archive.31376_1-0-1
   mytablepath/.hoodie/.commits_.archive.31377_1-0-1
   mytablepath/.hoodie/.commits_.archive.31378_1-0-1
   mytablepath/.hoodie/.commits_.archive.31379_1-0-1
   ```
   
   i did ingest to this table over 30000 times
   
   there are only 51 other non archive files. ie
   ```
   
mytablepath/.hoodie/.temp/20200929011850/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20200929011850.marker
   
mytablepath/.hoodie/.temp/20201106041159/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201106041159.marker
   
mytablepath/.hoodie/.temp/20201125123321/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201125123321.marker
   
mytablepath/.hoodie/.temp/20201125123321/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-222_20201125123321.marker
   
mytablepath/.hoodie/.temp/20201208015244/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201208015244.marker
   
mytablepath/.hoodie/.temp/20201208015244/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-222_20201208015244.marker
   
mytablepath/.hoodie/.temp/20201208065657/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201208065657.marker
   
mytablepath/.hoodie/.temp/20201212205947/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201212205947.marker
   
mytablepath/.hoodie/.temp/20201223083212/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201223083212.marker
   
mytablepath/.hoodie/.temp/20201224042147/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201224042147.marker
   mytablepath/.hoodie/20200929011850.commit.requested
   mytablepath/.hoodie/20200929011850.inflight
   mytablepath/.hoodie/20201020130743.commit.requested
   mytablepath/.hoodie/20201105000925.commit.requested
   mytablepath/.hoodie/20201106041159.commit.requested
   mytablepath/.hoodie/20201106041159.inflight
   mytablepath/.hoodie/20201125123321.commit.requested
   mytablepath/.hoodie/20201125123321.inflight
   mytablepath/.hoodie/20201203184433.commit.requested
   mytablepath/.hoodie/20201208015244.commit.requested
   mytablepath/.hoodie/20201208015244.inflight
   mytablepath/.hoodie/20201208065657.commit.requested
   mytablepath/.hoodie/20201208065657.inflight
   mytablepath/.hoodie/20201210023407.commit.requested
   mytablepath/.hoodie/20201212205947.commit.requested
   mytablepath/.hoodie/20201212205947.inflight
   mytablepath/.hoodie/20201213163733.commit.requested
   mytablepath/.hoodie/20201213163733.inflight
   mytablepath/.hoodie/20201216040208.commit.requested
   mytablepath/.hoodie/20201216040208.inflight
   mytablepath/.hoodie/20201223083212.commit.requested
   mytablepath/.hoodie/20201223083212.inflight
   mytablepath/.hoodie/20201224042147.commit.requested
   mytablepath/.hoodie/20201224042147.inflight
   mytablepath/.hoodie/20201229164435.clean
   mytablepath/.hoodie/20201229164435.clean.inflight
   mytablepath/.hoodie/20201229164435.clean.requested
   mytablepath/.hoodie/20201229164751.clean
   mytablepath/.hoodie/20201229164751.clean.inflight
   mytablepath/.hoodie/20201229164751.clean.requested
   mytablepath/.hoodie/20201229164751.commit
   mytablepath/.hoodie/20201229164751.commit.requested
   mytablepath/.hoodie/20201229164751.inflight
   mytablepath/.hoodie/20201229165044.commit
   mytablepath/.hoodie/20201229165044.commit.requested
   mytablepath/.hoodie/20201229165044.inflight
   mytablepath/.hoodie/hoodie.properties
   mytablepath/unknown/.hoodie_partition_metadata
   
mytablepath/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201229164435.parquet
   
mytablepath/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201229164751.parquet
   
mytablepath/unknown/bcb735bd-e33d-457b-8971-2818e130ec28-0_0-25-169_20201229165044.parquet
   ```
    
   **Expected behavior**
   
   how can i prevent so many .commits_.archive. files being created?
   
   **Environment Description**
   
   * Hudi version : 0.5.3
   
   * Spark version : 2.4.6
   
   * Hive version : 2.3.4
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to