kylebarron commented on code in PR #1222:
URL: 
https://github.com/apache/datafusion-python/pull/1222#discussion_r2319619566


##########
docs/source/user-guide/dataframe/index.rst:
##########
@@ -145,10 +145,39 @@ To materialize the results of your DataFrame operations:
     
     # Display results
     df.show()                         # Print tabular format to console
-    
+
     # Count rows
     count = df.count()
 
+PyArrow Streaming
+-----------------
+
+DataFusion DataFrames implement the ``__arrow_c_stream__`` protocol, enabling
+zero-copy streaming into libraries like `PyArrow <https://arrow.apache.org/>`_.
+Earlier versions eagerly converted the entire DataFrame when exporting to
+PyArrow, which could exhaust memory on large datasets. With streaming, batches
+are produced lazily so you can process arbitrarily large results without
+out-of-memory errors.
+
+.. code-block:: python
+
+    import pyarrow as pa
+
+    # Create a PyArrow RecordBatchReader without materializing all batches
+    reader = 
pa.RecordBatchReader._import_from_c_capsule(df.__arrow_c_stream__())
+    for batch in reader:
+        ...  # process each batch as it is produced
+
+DataFrames are also iterable, yielding :class:`pyarrow.RecordBatch` objects
+lazily so you can loop over results directly:
+
+.. code-block:: python
+
+    for batch in df:
+        ...  # process each batch as it is produced

Review Comment:
   We already have our own `RecordBatch` class: 
https://datafusion.apache.org/python/autoapi/datafusion/record_batch/index.html#datafusion.record_batch.RecordBatch
   
   Also, we should ensure that the dunder methods are rendered in the docs. It 
doesn't look like they are currently. (Or maybe the dunder methods on that 
`RecordBatch` aren't documented?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to