jenu9417 commented on issue #7991: URL: https://github.com/apache/hudi/issues/7991#issuecomment-1465110980
@nsivabalan Thanks for the update. `/data/testfolder` is the basepath for the table. To clarify the below is the folder structure. ``` /data/testfolder/ /data/testfolder/.hoodie/ /data/testfolder/.hoodie/.aux/ /data/testfolder/.hoodie/.aux/.bootstrap/.fileids/ /data/testfolder/.hoodie/.aux/.bootstrap/.partitions/ /data/testfolder/.hoodie/.temp/ /data/testfolder/.hoodie/.temp/20230303104616/ /data/testfolder/.hoodie/archived/ /data/testfolder/.hoodie/hoodie.properties ``` There are no other non hudi folders present inside `/data/testfolder/` And I'm seeing a lot of HEAD operations happening for `/data/` and ` /data/testfolder/` Few Examples from S3 access logs. LIST `` "GET /?prefix=repo%2Fsms_data_1_newtable_ind_mor%2F&delimiter=%2F&max-keys=2&encoding-type=url HTTP/1.1" "GET /?prefix=repo%2Fsms_data_1_newtable_ind_mor%2F.hoodie%2F&delimiter=%2F&max-keys=2&encoding-type=url HTTP/1.1" "GET /?prefix=repo%2Fsms_data_1_newtable_ind_mor%2F.hoodie%2F.aux%2F.bootstrap%2F.partitions%2F&delimiter=%2F&max-keys=2&encoding-type=url HTTP/1.1" "GET /?prefix=repo%2Fsms_data_1_newtable_ind_mor%2F&delimiter=%2F&max-keys=2&encoding-type=url HTTP/1.1" ``` HEAD ``` "HEAD /repo HTTP/1.1" "HEAD /repo/sms_data_1_newtable_ind_mor HTTP/1.1" "HEAD /repo/sms_data_1_newtable_ind_mor/.hoodie HTTP/1.1" "HEAD /repo/sms_data_1_newtable_ind_mor HTTP/1.1" ```` Such requests repeat through out the write operation. The major issue we face is the frequency of such API hits happening per write to 1 partition. We see around 100 LIST and 100 HEAD operations per write to 1 partition. Since LIST is costlier operation, the impact of such higher number of LIST API operations per write to 1 partition is making the overall approach costlier. If we could understand the correlation between various types of API hits (specifically LIST and HEAD) per write to 1 partition, it will be helpful for us to decide. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
