Weijun-H commented on PR #20823:
URL: https://github.com/apache/datafusion/pull/20823#issuecomment-4187154441

   > Thank you @ariel-miculas and @martin-g -- I think this is a nice written 
PR and a good approach to the problem.
   > 
   > In my opinion the only thing missing is some sort of test that ensures 
that there are fewer object store requests than before, and that future 
refactors don't accidentally change the pattern again.
   > 
   > I think you can find an example of such tests for the CSV source in 
[`73fbd48`/datafusion/core/tests/datasource/object_store_access.rs#L27-L26](https://github.com/apache/datafusion/blob/73fbd4807011ee102c9db67d40a511d0ece8c65a/datafusion/core/tests/datasource/object_store_access.rs#L27-L26)
   > 
   > Would it be possible to add the equivalent tests for the json reader so 
that we are confident that we will catch any regressions / changes in object 
store access patterns?
   > 
   > I am also going to do some local testing with the clickbench json file and 
see if I can find a performance difference too
   
   I agree with @alamb that an object-store access regression test would make 
this much safer to merge.
   
   A good minimal test might be a JSON equivalent of the CSV access-pattern 
test: repartition a single NDJSON file, run a simple query, and assert we only 
issue the expected ranged GETs for the partitions, without extra 
boundary-probing requests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to