jiale0402 opened a new issue, #38878: URL: https://github.com/apache/arrow/issues/38878
### Describe the usage question you have. Please include as many useful details as possible. platform: `NAME="Ubuntu" VERSION="23.04 (Lunar Lobster)"` pyarrow version: `pyarrow 14.0.1` `pyarrow-hotfix 0.5 ` python version: `Python 3.11.4 (main, Jun 9 2023, 07:59:55) [GCC 12.3.0] on linux` I have a very large single column csv file (about 63 million rows). I was hoping to create a lazy file streamer that reads one entry from the csv file at a time. I know each entry in my file has a length of 12 chars, so I tried setting block size to 13 (+1 for \n) with the pyarrow.csv.open_csv function. `import pyarrow.csv as csv` `c_options = csv.ConvertOptions(column_types={'dne': pa.float32()})` `r_options = csv.ReadOptions(skip_rows_after_names=8200,use_threads=True, column_names=["dne"],block_size=13)` `stream = csv.open_csv(file, convert_options = c_options, read_options = r_options )` this code functions properly as expected, but when i change the `skip_rows_after_names` param of read options to 8300 I start to get segmentation faults. How to fix this (or am I using it wrong)? I want to be able to use only a portion of at (like from row 98885 to 111200) I was able to produce this error on another computer with the exact same platform and versions. The file was created with `with open(f"feature_{i}.csv", "w+") as f: for i in range(FILE_LEN): n = random.uniform(-0.5, 0.5) nn = str(n)[:12] f.write(f"{nn}\n")` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org