AdarshKadameriTR commented on issue #7487: URL: https://github.com/apache/hudi/issues/7487#issuecomment-1375198359
Hi @xushiyan , We are incrementally upserting data into our Hudi table/s every 5 minutes. We have set **CLEANER_POLICY** as **KEEP_LATEST_BY_HOURS** with **CLEANER_HOURS_RETAINED** = 48. The only command we execute is **Upsert** and we have single writer and compaction **runs every hour**. pls share more info like what the job is doing when this occurs - is it reading or writing? : Our application job is only doing write operation using upserts as mentioned above. As per discussion with AWS they see s3 get API up to 700 times per second. From the logs we can see Hudi internally is calling these get operations on the log files in table partitions. Most likely Hudi compaction is calling those read operations. have you run clustering for this table? We have **not enabled clustering** on the tables. what do the writer configs look like? Given in below screenshots   **Partition structure**: s3://bucket/table/partition/parquet and .log files **Note**:- We have an open issue on old log files not getting cleaned by hudi cleaner. **https://github.com/apache/hudi/issues/7600** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
