alamb commented on issue #9964: URL: https://github.com/apache/arrow-datafusion/issues/9964#issuecomment-2039954113
FYI I think this is more like an Epic that can be used to coordinate individual tasks / changes rather than a specific change itself. > Interested in this one Thanks @Lordworms -- one thing that would probably help to start this project along would be to gather some data. Specifically, put the LIstingTable against data on a remote object store (eg. figure out how to write a query against 100 parquet files on an S3 bucket). And then measure how much time is spent: 1. object store listing 2. fetching metadata 3. pruning / fetching IO 4. How many object store requests are made Does anyone know a good public data set on S3 that we could use to test / benchmark with? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
