timsaucer commented on code in PR #1376:
URL: 
https://github.com/apache/datafusion-python/pull/1376#discussion_r2794347792


##########
python/datafusion/substrait.py:
##########
@@ -137,6 +155,29 @@ def deserialize_bytes(proto_bytes: bytes) -> Plan:
         """
         return Plan(substrait_internal.Serde.deserialize_bytes(proto_bytes))
 
+    @staticmethod
+    def serialize_json(sql: str, ctx: SessionContext, path: str | 
pathlib.Path) -> None:

Review Comment:
   Is the preferred method here to write to a file or to just produce a string 
output? Maybe the author of the issue has an opinion on best use case.



##########
src/errors.rs:
##########
@@ -34,6 +36,8 @@ pub enum PyDataFusionError {
     Common(String),
     PythonError(PyErr),
     EncodeError(EncodeError),
+    DecodeError(DecodeError),
+    SerdeJsonError(SerdeJsonError),

Review Comment:
   Recommend removing SerdeJsonError and mapping to EncodeError or DecodeError. 
Alternatively putting into DataFusion execution error.



##########
python/datafusion/substrait.py:
##########
@@ -137,6 +155,29 @@ def deserialize_bytes(proto_bytes: bytes) -> Plan:
         """
         return Plan(substrait_internal.Serde.deserialize_bytes(proto_bytes))
 
+    @staticmethod
+    def serialize_json(sql: str, ctx: SessionContext, path: str | 
pathlib.Path) -> None:
+        """Serialize a SQL query to a JSON-formatted Substrait plan file.

Review Comment:
   Is the output a json file? I'm trying to understand the "serialize" in this 
context.



##########
python/tests/test_substrait.py:
##########
@@ -74,3 +74,48 @@ def test_substrait_file_serialization(ctx, tmp_path, 
path_to_str):
     expected_actual_plan = ss.Consumer.from_substrait_plan(ctx, actual_plan)
 
     assert str(expected_logical_plan) == str(expected_actual_plan)
+
+def test_substrait_plan_to_from_json(ctx):
+    # Create a simple plan
+    sql = "SELECT * FROM t"
+    batch = pa.RecordBatch.from_arrays(
+        [pa.array([1, 2, 3]), pa.array([4, 5, 6])],
+        names=["a", "b"],
+    )
+    ctx.register_record_batches("t", [[batch]])
+
+    # Serialize to plan
+    plan = ss.Serde.serialize_to_plan(sql, ctx)
+
+    # Convert plan to JSON
+    json_str = plan.to_json()
+    assert isinstance(json_str, str)
+    assert "relations" in json_str  # basic sanity check

Review Comment:
   If these conversions to json are stable as the issue suggests, then I would 
think we could compare the entire output, right? What do the other libraries 
produce?



##########
python/datafusion/substrait.py:
##########
@@ -67,6 +67,24 @@ def encode(self) -> bytes:
         """
         return self.plan_internal.encode()
 
+    def to_json(self) -> str:
+        """Serialize the plan to a JSON string.
+
+        Returns a JSON string following the Protobuf JSON Mapping.

Review Comment:
   I find the statement "Returns a JSON string following the Protobuf JSON 
Mapping" to be hard to understand. What exactly does this mean? Is this LLM 
generated?



##########
python/datafusion/substrait.py:
##########
@@ -137,6 +155,29 @@ def deserialize_bytes(proto_bytes: bytes) -> Plan:
         """
         return Plan(substrait_internal.Serde.deserialize_bytes(proto_bytes))
 
+    @staticmethod
+    def serialize_json(sql: str, ctx: SessionContext, path: str | 
pathlib.Path) -> None:
+        """Serialize a SQL query to a JSON-formatted Substrait plan file.
+
+        Args:
+            sql: SQL query to serialize.
+            ctx: SessionContext to use.
+            path: Path to write the JSON-formatted Substrait plan to.
+        """
+        return substrait_internal.Serde.serialize_json(sql, ctx.ctx, str(path))
+
+    @staticmethod
+    def deserialize_json(path: str | pathlib.Path) -> Plan:
+        """Deserialize a Substrait plan from a JSON file.

Review Comment:
   Similarly, not sure if this needs to be a path to a file that we read or 
expect to pass in a string. Since it's python it's fairly easy to handle 
either/both



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to