[GitHub] [arrow] wjones1 commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

GitBox Sat, 25 Apr 2020 12:43:39 -0700


wjones1 commented on a change in pull request #6979:
URL: https://github.com/apache/arrow/pull/6979#discussion_r415130068




##########
File path: python/pyarrow/tests/test_parquet.py
##########
@@ -179,6 +179,99 @@ def alltypes_sample(size=10000, seed=0, categorical=False):
 
 
 @pytest.mark.pandas
+def test_iter_batches_columns_reader(tempdir):
+    df = alltypes_sample(size=10000, categorical=True)
+
+    filename = tempdir / 'pandas_roundtrip.parquet'
+    arrow_table = pa.Table.from_pandas(df)
+    _write_table(arrow_table, filename, version="2.0",
+                 coerce_timestamps='ms', chunk_size=1000)
+
+    columns = df.columns[4:15]
+
+    file_ = pq.ParquetFile(filename)
+
+    batches = file_.iter_batches(
+        batch_size=500,
+        columns=columns
+    )
+
+    tm.assert_frame_equal(
+        next(batches).to_pandas(),
+        df.iloc[:500, :].loc[:, columns]
+    )
+
+
[email protected]
[email protected]('chunk_size', [1000])
+def test_iter_batches_reader(tempdir, chunk_size):

Review comment:
       Strangely, after I merged the latest changes from master, I am no longer 
seeing this issue with dictionary arrays. I definitely saw it in the original 
fork, so I think it may have actually been fixed (though not sure where).
   
   I've removed the dictionary array correction in the test and hopefully the 
CI should confirm what I am seeing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wjones1 commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

Reply via email to