[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #811: Add support for reading remote storage systems

GitBox Thu, 19 Aug 2021 00:45:18 -0700


jorgecarleitao commented on pull request #811:
URL: https://github.com/apache/arrow-datafusion/pull/811#issuecomment-901687032



   Thanks a lot for taking a good look at this and for the proposal.
   
   > Propogate Async API all the way up and finally change the user-facing API: 
including DataFrame & ExecutionContext
   
   Could you describe which APIs would be affected by this? For example, 
creating a logical plan would become `async` because we have to read metadata 
to build a schema, correct? So, for example, things like `df = 
context.read_parquet(...).await?;`, right?
   
   I agree with making the planing `async`: there is no guarantee that we 
synchronously have all the information to build the plan in the first place, 
and imo we should not block because we need to read 50 metadata files from s3.
   
   I agree that this would be a major change. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #811: Add support for reading remote storage systems

Reply via email to