[GitHub] [arrow] wjones127 commented on pull request #35568: GH-33986: [Python] Sketch out a minimal protocol interface for datasets

via GitHub Wed, 17 May 2023 20:34:48 -0700


wjones127 commented on PR #35568:
URL: https://github.com/apache/arrow/pull/35568#issuecomment-1552355842


   > I'm not sure the API you are defining helps you with that goal. I think 
what I is missing is the API used to create the dataset. What you've proposed 
here isn't flexible enough. For example, if I'm trying to convert a "named 
table request" (e.g. give me all rows from table "widgets" with filter "xyz" at 
time point Y) into a "scan request" (e.g. what pyarrow datasets can read) then 
I want something like...
   
   @westonpace you are correct that this doesn't define how such dataset 
classes are built. That's left to the consumer, who will write their own 
classes that conform to this API.
   
   However, I do like your idea for a dataset builder. I think it might be 
worth asking the PyIceberg developers whether something like that would work 
well for them. (I think Delta Lake and Lance will likely go the route of 
implementing their own classes in Rust.) I've noted this idea in 
https://docs.google.com/document/d/1-uVkSZeaBtOALVbqMOPeyV3s2UND7Wl-IGEZ-P-gMXQ/edit#heading=h.31rf5m1tlipg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wjones127 commented on pull request #35568: GH-33986: [Python] Sketch out a minimal protocol interface for datasets

Reply via email to