Xuanwo commented on issue #388: URL: https://github.com/apache/arrow-rs-object-store/issues/388#issuecomment-2922852088
We've had similar discussions about adding sort support for fs several times. For example, see https://github.com/apache/arrow-rs/issues/6375. Let me share my previous comment: >> LocalFileSystem should mimic what ListObjectsV2 does, returning objects in lexicographical order of their keys. > > Hi, this behavior is expected. We can't perform sorting over the LocalFileSystem since we don't have such an API, and we can't collect keys and sort them in-memory. The best we can do is return objects in the order that the filesystem provides. >> Intended only for small stores that fin in memory? > > The most interesting part is this: we don't know how many items there are until we list them all. This inconsistent behavior based on item size might introduce more issues. For example, users might find the object_store behaves one way when there are only 10 items, but differently in production services. --- The only thing we can rely on is that the file system will return the same results if there are no external changes. So, we can simply list from the given offset without worrying about missing any entries. > with the current behavior, offset listing just can't be used for local files. But the feature is implemented & documented and none of the docs say that this doesn't work on local files. Moreover, it works on the other stores. This is incorrect because offset listing doesn't require the results to be ordered; it only requires the filesystem to have stable results, meaning that the order remains consistent within the filesystem itself. If we implement sorting for file systems or other services that do not return entries in order (as mentioned by @tustvold, such as AWS S3 Express), our users might experience the following strange issues: - The list operation hangs indefinitely without returning any entries. - Memory usage keeps increasing, potentially leading to an out-of-memory (OOM) situation. - The number of API requests grows significantly, even when users only want to fetch the first entry. So my suggestion is to keep our current behavior AS IS, and perhaps clarify this in our documentation in case it might confuse users for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org