BlakeOrth commented on code in PR #18146:
URL: https://github.com/apache/datafusion/pull/18146#discussion_r2500054103


##########
datafusion/core/tests/datasource/object_store_access.rs:
##########
@@ -194,17 +183,8 @@ async fn query_partitioned_csv_file() {
     +---------+-------+-------+---+----+-----+
     ------- Object Store Request Summary -------
     RequestCountingObjectStore()
-    Total Requests: 11
-    - LIST (with delimiter) prefix=data
-    - LIST (with delimiter) prefix=data/a=1
-    - LIST (with delimiter) prefix=data/a=2
-    - LIST (with delimiter) prefix=data/a=3
-    - LIST (with delimiter) prefix=data/a=1/b=10
-    - LIST (with delimiter) prefix=data/a=2/b=20
-    - LIST (with delimiter) prefix=data/a=3/b=30
-    - LIST (with delimiter) prefix=data/a=1/b=10/c=100
-    - LIST (with delimiter) prefix=data/a=2/b=20/c=200
-    - LIST (with delimiter) prefix=data/a=3/b=30/c=300
+    Total Requests: 2
+    - LIST prefix=data
     - GET  (opts) path=data/a=2/b=20/c=200/file_2.csv

Review Comment:
   The primary reason https://github.com/apache/datafusion/issues/17211 was 
written was because the caching mechanism you see in `list_all_files()` was 
never available to partitioned tables because `list_all_files()` was never 
called for partitioned cases. The current implementation calls the underlying 
object_store's `list` method directly, bypassing any of the caching 
functionality. That's more or less what this PR seeks to change!
   
   That being said, what you've raised with your path examples is exactly the 
case I was thinking of when I noted that cache entries for partitioned tables 
might need to be "prefix aware". If you have any thoughts on a good way to 
design the cache keys to help solve this problem I'd love additional thoughts 
and input on the topic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to