Weijun-H commented on PR #20823: URL: https://github.com/apache/datafusion/pull/20823#issuecomment-4187154441
> Thank you @ariel-miculas and @martin-g -- I think this is a nice written PR and a good approach to the problem. > > In my opinion the only thing missing is some sort of test that ensures that there are fewer object store requests than before, and that future refactors don't accidentally change the pattern again. > > I think you can find an example of such tests for the CSV source in [`73fbd48`/datafusion/core/tests/datasource/object_store_access.rs#L27-L26](https://github.com/apache/datafusion/blob/73fbd4807011ee102c9db67d40a511d0ece8c65a/datafusion/core/tests/datasource/object_store_access.rs#L27-L26) > > Would it be possible to add the equivalent tests for the json reader so that we are confident that we will catch any regressions / changes in object store access patterns? > > I am also going to do some local testing with the clickbench json file and see if I can find a performance difference too I agree with @alamb that an object-store access regression test would make this much safer to merge. A good minimal test might be a JSON equivalent of the CSV access-pattern test: repartition a single NDJSON file, run a simple query, and assert we only issue the expected ranged GETs for the partitions, without extra boundary-probing requests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
