timwaizenegger commented on issue #388:
URL: 
https://github.com/apache/arrow-rs-object-store/issues/388#issuecomment-3155091963

   Sorry for the late reply. Our use case is this:
   
   our application processes objects/files in batches. Each batch execution is 
an independent stateless function call. I can only pass strings/primitive types 
between these function calls. So today, we use a continuation token. The very 
first invocation has an empty string so we start processing from the beginning. 
Then we return the last object name and the execution framework feeds that into 
the next invocation.
   
   With S3 and the other object stores, I can do paginated listing with such a 
token. 
   
   
   A bit more background:
   Our code is part of a custom postgres DB extension where each batch runs in 
its own "transaction context"; PG memory safety imposes constraints on what 
data we can pass. 
   
   
   
   ### Options I can see
   1. application re-write; could e.g. separate the listing from the processing 
logic. It's just a heavy lift for us so I'm looking for other options
   2. lift the sorting/pagination logic into the app; rather than ask 
object_store to do it
     - possibly; but only needed for local files not other object stores. So 
I'd have to break the generic access pattern I can use today and introduce a 
special case for a certain type of object_store back end
   3. Have a feature/config option on object_store that makes it return sorted 
results to match behavior of other stores
   
   
   
   (3) is just a nice and clean solution for our use case. I agree it will have 
poor performance; it's a tradeoff users can decide to make. 
   
   Would you support adding a rust feature or config option (e.g. a 
`with_sorted_listing`) to the `LocalFileSystem` implementation? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to