timwaizenegger commented on issue #388: URL: https://github.com/apache/arrow-rs-object-store/issues/388#issuecomment-3155091963
Sorry for the late reply. Our use case is this: our application processes objects/files in batches. Each batch execution is an independent stateless function call. I can only pass strings/primitive types between these function calls. So today, we use a continuation token. The very first invocation has an empty string so we start processing from the beginning. Then we return the last object name and the execution framework feeds that into the next invocation. With S3 and the other object stores, I can do paginated listing with such a token. A bit more background: Our code is part of a custom postgres DB extension where each batch runs in its own "transaction context"; PG memory safety imposes constraints on what data we can pass. ### Options I can see 1. application re-write; could e.g. separate the listing from the processing logic. It's just a heavy lift for us so I'm looking for other options 2. lift the sorting/pagination logic into the app; rather than ask object_store to do it - possibly; but only needed for local files not other object stores. So I'd have to break the generic access pattern I can use today and introduce a special case for a certain type of object_store back end 3. Have a feature/config option on object_store that makes it return sorted results to match behavior of other stores (3) is just a nice and clean solution for our use case. I agree it will have poor performance; it's a tradeoff users can decide to make. Would you support adding a rust feature or config option (e.g. a `with_sorted_listing`) to the `LocalFileSystem` implementation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org