AdarshKadameriTR commented on issue #7487:
URL: https://github.com/apache/hudi/issues/7487#issuecomment-1375198359

   Hi @xushiyan ,
   
   We are incrementally upserting data into our Hudi table/s every 5 minutes. 
   We have set **CLEANER_POLICY** as **KEEP_LATEST_BY_HOURS** with 
**CLEANER_HOURS_RETAINED** = 48.  The only command we execute is **Upsert** and 
we have single writer and compaction **runs every hour**.
   
   
   pls share more info like what the job is doing when this occurs - is it 
reading or writing? : 
   Our application job is only doing write operation using upserts as mentioned 
above. As per discussion with AWS they see s3 get API up to 700 times per 
second. From the logs we can see Hudi internally is calling these get 
operations on the log files in table partitions. Most likely Hudi compaction is 
calling those read operations.
   
   have you run clustering for this table? 
   We have **not enabled clustering** on the tables.
   
   what do the writer configs look like? 
   Given in below screenshots
   
   
![210503366-77d47c7c-169f-4a87-8234-0971079a9347](https://user-images.githubusercontent.com/110987545/211257318-ac7a3c01-3fd7-445e-8aee-b103d9cf06c1.png)
   
![210501558-28eb3712-fed8-4c93-9c85-ccb6ef3521dc](https://user-images.githubusercontent.com/110987545/211257330-c2ffd236-c08a-4169-a651-4cd4c2b62dbe.png)
   
   **Partition structure**: s3://bucket/table/partition/parquet and .log files
   
   
   **Note**:- We have an open issue on old log files not getting cleaned by 
hudi cleaner. **https://github.com/apache/hudi/issues/7600**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to