[GitHub] [arrow] pitrou commented on a diff in pull request #13264: ARROW-16272: [Python] Fix NativeFile.read1()

GitBox Tue, 31 May 2022 05:40:59 -0700


pitrou commented on code in PR #13264:
URL: https://github.com/apache/arrow/pull/13264#discussion_r885587724



##########
python/pyarrow/tests/test_fs.py:
##########
@@ -1649,3 +1649,20 @@ def check_copied_files(destination_dir):
     destination_dir5.mkdir()
     copy_files(source_dir, destination_dir5, chunk_size=1, use_threads=False)
     check_copied_files(destination_dir5)
+
+
[email protected]
[email protected]
+def test_pandas_text_reader_s3(tmpdir):
+    # ARROW-16272: Pandas' read_csv() should not exhaust a S3 input stream
+    # when a small nrows is passed.
+    import pandas as pd
+    from pyarrow.fs import S3FileSystem
+
+    fs = S3FileSystem(anonymous=True, region="us-east-2")
+    f = fs.open_input_file("ursa-qa/nyctaxi/yellow_tripdata_2010-01.csv")
+
+    df = pd.read_csv(f, nrows=2)
+    assert list(df["vendor_id"]) == ["VTS", "DDS"]
+    # Some readahead occurred, but not up to the end of file (which is ~2 GB)
+    assert f.tell() <= 256 * 1024

Review Comment:
   Ah, you're right, it works as well with a local file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on a diff in pull request #13264: ARROW-16272: [Python] Fix NativeFile.read1()

Reply via email to