[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

via GitHub Tue, 22 Aug 2023 17:58:09 -0700


lidavidm commented on code in PR #989:
URL: https://github.com/apache/arrow-adbc/pull/989#discussion_r1302334457



##########
python/adbc_driver_manager/adbc_driver_manager/dbapi.py:
##########
@@ -973,7 +1012,7 @@ def fetchone(self) -> Optional[tuple]:
         self.rownumber += 1
         return row
 
-    def fetchmany(self, size: int):
+    def fetchmany(self, size: int) -> List[Optional[tuple]]:

Review Comment:
   :+1: thanks!



##########
python/adbc_driver_manager/adbc_driver_manager/dbapi.py:
##########
@@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame":
             )
         return self._results.fetch_df()
 
+    def fetch_record_batch(self, rows_per_batch: int) -> 
Optional["_BatchIterator"]:

Review Comment:
   This should be called `fetch_record_batch_reader`, and should return a 
pyarrow.RecordBatchReader, IMO



##########
python/adbc_driver_manager/adbc_driver_manager/dbapi.py:
##########
@@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame":
             )
         return self._results.fetch_df()
 
+    def fetch_record_batch(self, rows_per_batch: int) -> 
Optional["_BatchIterator"]:

Review Comment:
   The implementation can just be `return self._results._reader` IMO, or 
possibly we can wrap the reader to finagle the lifetime (but I'm not sure it's 
worth doing it right away)



##########
python/adbc_driver_manager/adbc_driver_manager/dbapi.py:
##########
@@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame":
             )
         return self._results.fetch_df()
 
+    def fetch_record_batch(self, rows_per_batch: int) -> 
Optional["_BatchIterator"]:
+        """
+        Fetch #(rows_per_batch) batches akin to
+        
https://duckdb.org/docs/guides/python/export_arrow.html#export-as-a-recordbatchreader
+
+        Notes
+        -----
+        This is an extension and not part of the DBAPI standard.
+        """
+        if self._results is None:
+            raise ProgrammingError(
+                "Cannot fetch_record_batch() before execute()",
+                status_code=_lib.AdbcStatusCode.INVALID_STATE,
+            )
+        self._batched_results = _BatchIterator(self._results._reader, 
rows_per_batch)
+        return self._batched_results
+
+    def read_next_batch(self) -> List[Optional[tuple]]:

Review Comment:
   Hmm, maybe this should be called `fetch_record_batch`?



##########
python/adbc_driver_manager/adbc_driver_manager/dbapi.py:
##########
@@ -926,6 +927,44 @@ def fetch_df(self) -> "pandas.DataFrame":
             )
         return self._results.fetch_df()
 
+    def fetch_record_batch(self, rows_per_batch: int) -> 
Optional["_BatchIterator"]:

Review Comment:
   The intent of the issue was to get the PyArrow record batch reader, which 
avoids going through Python objects like BatchIterator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #989: feat(python/adbc_driver_manager): add fetch_record_batch

Reply via email to