lidavidm commented on code in PR #13264:
URL: https://github.com/apache/arrow/pull/13264#discussion_r885567674


##########
python/pyarrow/tests/test_fs.py:
##########
@@ -1649,3 +1649,20 @@ def check_copied_files(destination_dir):
     destination_dir5.mkdir()
     copy_files(source_dir, destination_dir5, chunk_size=1, use_threads=False)
     check_copied_files(destination_dir5)
+
+
[email protected]
[email protected]
+def test_pandas_text_reader_s3(tmpdir):
+    # ARROW-16272: Pandas' read_csv() should not exhaust a S3 input stream
+    # when a small nrows is passed.
+    import pandas as pd
+    from pyarrow.fs import S3FileSystem
+
+    fs = S3FileSystem(anonymous=True, region="us-east-2")
+    f = fs.open_input_file("ursa-qa/nyctaxi/yellow_tripdata_2010-01.csv")
+
+    df = pd.read_csv(f, nrows=2)
+    assert list(df["vendor_id"]) == ["VTS", "DDS"]
+    # Some readahead occurred, but not up to the end of file (which is ~2 GB)
+    assert f.tell() <= 256 * 1024

Review Comment:
   To me it seems S3 is unnecessary here? Or at least the 'real' S3 is 
unnecessary here?



##########
python/pyarrow/io.pxi:
##########
@@ -436,14 +440,22 @@ cdef class NativeFile(_Weakrefable):
     def read1(self, nbytes=None):
         """Read and return up to n bytes.
 
-        Alias for read, needed to match the BufferedIOBase interface.
+        A short result does not imply that EOF is imminent.
 
         Parameters
         ----------
         nbytes : int
             The maximum number of bytes to read.
         """
-        return self.read(nbytes=None)
+        if nbytes is None:
+            # The expectation when passing `nbytes=None` is not to read the
+            # entire file but to issue a single underlying read call up to
+            # a reasonable size (the use case being to read a bufferable
+            # amount of bytes, such as with io.TextIOWrapper).
+            nbytes = self._default_chunk_size
+        else:
+            nbytes = min(nbytes, self._default_chunk_size)

Review Comment:
   Why are we limiting the read size to the chunk size when an explicit size is 
passed? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to