jorisvandenbossche commented on a change in pull request #12627:
URL: https://github.com/apache/arrow/pull/12627#discussion_r831413243



##########
File path: docs/source/python/data.rst
##########
@@ -512,3 +512,23 @@ a new schema and cast the data to this schema:
 
 Metadata key and value pair are ``std::string`` objects in the C++ 
implementation
 and so they are bytes objects (``b'...'``) in Python.
+
+Record Batch Readers
+--------------------
+
+Many functions in PyArrow either return or take as an argument a 
:class:`RecordBatchReader`.
+It can be used like any iterable of record batches, but also provides their 
common
+schema without having to get any of the batches.
+
+.. ipython:: python

Review comment:
       You can already use the "doctest format" right now, since it's basically 
a plain code block but formatted with python's `>>>` (I don't think we would 
need to use the doctest directive, since pytest can check doctests in plain 
code blocks as well). Basically it's the same as what you included in the 
docstring:
   
   ```
   ::
   
       >>> schema = pa.schema([('x', pa.int64())])
       >>> def iter_record_batches():
       ...     for i in range(2):
       ...     yield pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], 
schema=schema)
       >>> reader = pa.RecordBatchReader.from_batches(schema, 
iter_record_batches())
       >>> print(reader.schema)
       pyarrow.Schema
       x: int64
       >>> for batch in reader:
       ...     print(batch)
       pyarrow.RecordBatch
       x: int64
       pyarrow.RecordBatch
       x: int64
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to