[GitHub] [arrow] rollokb commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

GitBox Sat, 25 Apr 2020 02:51:43 -0700


rollokb commented on a change in pull request #6979:
URL: https://github.com/apache/arrow/pull/6979#discussion_r415027375




##########
File path: python/pyarrow/tests/test_parquet.py
##########
@@ -179,6 +179,99 @@ def alltypes_sample(size=10000, seed=0, categorical=False):
 
 
 @pytest.mark.pandas
+def test_iter_batches_columns_reader(tempdir):
+    df = alltypes_sample(size=10000, categorical=True)
+
+    filename = tempdir / 'pandas_roundtrip.parquet'
+    arrow_table = pa.Table.from_pandas(df)
+    _write_table(arrow_table, filename, version="2.0",
+                 coerce_timestamps='ms', chunk_size=1000)
+
+    columns = df.columns[4:15]
+
+    file_ = pq.ParquetFile(filename)
+
+    batches = file_.iter_batches(
+        batch_size=500,
+        columns=columns
+    )
+
+    tm.assert_frame_equal(
+        next(batches).to_pandas(),
+        df.iloc[:500, :].loc[:, columns]
+    )
+
+
[email protected]
[email protected]('chunk_size', [1000])
+def test_iter_batches_reader(tempdir, chunk_size):

Review comment:
       @wjones1  thank you very much for taking over my PR. Unfortunately one 
of the reasons I abandoned this (and just built a wheel for my own company's 
project), was not knowing what to do about the problem this test highlights.
   
   The problem with dictionary arrays is far beyond my skill. I was wondering 
what someone with more architectural vision thinks of this issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] rollokb commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

Reply via email to