OliLay commented on PR #41564: URL: https://github.com/apache/arrow/pull/41564#issuecomment-2110203513
> My point is that _if the path cannot be written to_, the error happens when opening the file, not later on. That is true. I guess the question is if `arrow`'s OutputStream API makes an explicit _guarantee_ that `Open` should throw if the target does not exist. My guess would be that you shouldn't built code upon this assumption if it isn't explicitly stated in `arrow`'s API/docs (which it is not), but of course real-world usage deviates from that (Hyrum's Law). But checking if the bucket exists would at least come with another 1x RTT to S3 and the goal of the PR was to reduce the amount of blocking calls to S3 to reduce overall latency. If we add another check here, we'll have a total 2x RTT to S3 for small uploads, which is better than the initial 3x RTT without this change, but still not optimal from a performance-view. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
