westonpace commented on issue #33986:
URL: https://github.com/apache/arrow/issues/33986#issuecomment-1416351103
From an implementation perspective I suspect we can satisfy any of these
proposed APIs. If we need to come up with a new API then my preference is
Substrait, but if the consensus heads in some other direction I'm fine with
that too.
@jorisvandenbossche , your proposal seems fine, but I don't see anything in
there for filesystems. I think this is for on-disk data moreso than purely
in-memory data. Though I believe your approach could be adapted to include
filesystems.
> Does object store rs work for this?
Yes, I would assume that object store rs would be able to satisfy this but
I'm not familiar with the capabilities. For example, my idea of how this would
work in Substrait would be:
```
# This would be usable as ReadRel::read_type
message Dataset {
# This is already definedin ReadRel and is basically a list of files
# and a format object which defines things like delimiter (for CSV)
LocalFiles files = 0;
oneof filesystem {
LocalFilesystem = 1;
S3Filesystem = 2;
ExtensionFilesystem = 3;
}
message LocalFilesystem {}
message S3Filesystem {
string region;
string client_id;
string client_secret; // could be omitted if credentials negotiated
elsewhere
}
message ExtensionFilesystem {
google.protobuf.Any details;
}
}
```
The equivalent C interface would just be structifying those messages.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]