jiale0402 commented on issue #38878:
URL: https://github.com/apache/arrow/issues/38878#issuecomment-1826848002

   Here's a python sample that I created to replicate this behavior. I tested 
on my end and it seems to work. Please let me know if it does produce 
`pyarrow.lib.ArrowInvalid: straddling object straddles two block boundaries 
(try to increase block size?)` or not. Thanks!
   ```
   path = "path/to/csv"
   row_len = 12
   import random
   with open(path) as f:
       for i in range(30000):
           n = random.uniform(-0.5, 0.5)
           n = str(n)[:row_len]
           f.write(f"{n}\n")
   import pyarrow.csv as csv
   stream = csv.open_csv(
               path, 
               read_options = csv.ReadOptions(
                   skip_rows_after_names=20000,
                   use_threads=False, 
                   block_size=12+1,
               )
           )
   ```
   
   By the way, I notice that arrow aims to find a deliminator in the block and 
otherwise raises an error in the code you quoted. But by standard csv format a 
single column csv file would not contain any deliminator `,`. However, if I do 
not set the `skip_rows_after_names` param to the `pyarrow.csv.ReadOptions` in 
the sample above, then the code functions properly, despite the fact that 
there's no deliminator in that code anywhere. Is this intended behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to