pspoerri opened a new pull request, #41694:
URL: https://github.com/apache/arrow/pull/41694

   ### Rationale for this change
   
   Allow reading from non-seekable FIFO paths (e.g. stdin).
   
   Example: Currently the following code snippet is not allowed:
   
   ```
   from pyarrow import csv, input_stream
   
   stdin = input_stream('/dev/stdin')
   data = csv.read_csv(stdin)
   print(data)
   ```
   
   Running this code will always trigger an OSError: ```
   # cat test.csv | python test2.py
   Traceback (most recent call last):
     File "/mnt/nvme0n1/psp/arrow-test/test.py", line 4, in <module>
       stdin = input_stream('/dev/stdin')
     File "pyarrow/io.pxi", line 2690, in pyarrow.lib.input_stream
     File "pyarrow/io.pxi", line 1164, in pyarrow.lib.OSFile.__cinit__
     File "pyarrow/io.pxi", line 1176, in pyarrow.lib.OSFile._open_readable
     File "pyarrow/error.pxi", line 154, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
   OSError: lseek failed
   ```
   
   To get around this implementation one has to read stdin through a python 
file:
   ```
   import sys
   import os
   
   from pyarrow import csv
   
   stdin = os.fdopen(sys.stdin.fileno(), "rb")
   data = csv.read_csv(stdin)
   print(data)
   ```
   
   
   Example csv:
   ```
   customer_id,customer
   1,customer1
   2,customer2
   ```
   
   ### What changes are included in this PR?
   
   Set the size of the to `-1` if the stream is not seekable. This has been 
used here to configure non-seekable file descriptors: 
https://github.com/pspoerri/arrow/blob/main/cpp/src/arrow/io/file.cc#L94-L95
   
   ### Are these changes tested?
   
   I tested the code samples above.
   
   ### Are there any user-facing changes?
   
   Not that I am aware of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to