[GitHub] [iceberg] Fokko commented on a diff in pull request #7148: Python: Add conversion from iceberg table scan to ray dataset

via GitHub Wed, 22 Mar 2023 12:01:31 -0700


Fokko commented on code in PR #7148:
URL: https://github.com/apache/iceberg/pull/7148#discussion_r1145279476



##########
python/pyiceberg/table/__init__.py:
##########
@@ -415,3 +416,8 @@ def to_duckdb(self, table_name: str, connection: 
Optional[DuckDBPyConnection] =
         con.register(table_name, self.to_arrow())
 
         return con
+
+    def to_ray(self) -> ray.data.dataset.Dataset:
+        import ray
+
+        return ray.data.from_arrow(self.to_arrow())

Review Comment:
   Arrow also allows `to_batches` which will consume the source in batches, but 
not sure if Ray can leverage this. In the end, we want to leverage the PyArrow 
dataset, but this requires much deeper integration with Arrow. I'm digging into 
this, but no design yet. Maybe we can send a substrait plan? Open for discussion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on a diff in pull request #7148: Python: Add conversion from iceberg table scan to ray dataset

Reply via email to