Re: [I] LocalFileSystem: offset for `list_with_offset` can't be identified / List results must be sorted [arrow-rs-object-store]

via GitHub Fri, 30 May 2025 09:34:59 -0700


Xuanwo commented on issue #388:
URL: 
https://github.com/apache/arrow-rs-object-store/issues/388#issuecomment-2922852088


   We've had similar discussions about adding sort support for fs several 
times. For example, see https://github.com/apache/arrow-rs/issues/6375.
   
   Let me share my previous comment:
   
   >> LocalFileSystem should mimic what ListObjectsV2 does, returning objects 
in lexicographical order of their keys.
   >
   > Hi, this behavior is expected. We can't perform sorting over the 
LocalFileSystem since we don't have such an API, and we can't collect keys and 
sort them in-memory. The best we can do is return objects in the order that the 
filesystem provides.
   
   >> Intended only for small stores that fin in memory?
   >
   > The most interesting part is this: we don't know how many items there are 
until we list them all. This inconsistent behavior based on item size might 
introduce more issues. For example, users might find the object_store behaves 
one way when there are only 10 items, but differently in production services.
   
   ---
   
   The only thing we can rely on is that the file system will return the same 
results if there are no external changes. So, we can simply list from the given 
offset without worrying about missing any entries.
   
   > with the current behavior, offset listing just can't be used for local 
files. But the feature is implemented & documented and none of the docs say 
that this doesn't work on local files. Moreover, it works on the other stores.
   
   This is incorrect because offset listing doesn't require the results to be 
ordered; it only requires the filesystem to have stable results, meaning that 
the order remains consistent within the filesystem itself.
   
   If we implement sorting for file systems or other services that do not 
return entries in order (as mentioned by @tustvold, such as AWS S3 Express), 
our users might experience the following strange issues:
   
   - The list operation hangs indefinitely without returning any entries.
   - Memory usage keeps increasing, potentially leading to an out-of-memory 
(OOM) situation.
   - The number of API requests grows significantly, even when users only want 
to fetch the first entry.
   
   So my suggestion is to keep our current behavior AS IS, and perhaps clarify 
this in our documentation in case it might confuse users for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] LocalFileSystem: offset for `list_with_offset` can't be identified / List results *must* be sorted [arrow-rs-object-store]

Reply via email to

Re: [I] LocalFileSystem: offset for `list_with_offset` can't be identified / List results must be sorted [arrow-rs-object-store]