thisisnic commented on issue #46391:
URL: https://github.com/apache/arrow/issues/46391#issuecomment-2873121222

   I looked into this here: https://github.com/apache/arrow/discussions/46383 
but will summarise key info below:
   
   * Parquet file is here: 
https://github.com/softcite/softcite-extractions-parquet-analysis/raw/refs/heads/main/data/softcite-extractions-oa-data/p01_one_percent_random_subset/papers.parquet
   * Rewriting the dataset (in R but without materialising it in the R session) 
results in the error disappearing so it looks like it's something strange with 
how the file was written or how it's being read
   * Reading the dataset as an Arrow Table results in the error disappearing so 
it looks like something to do with Datasets (or filter pushdown?)
   * I get the same results in Python and R, so it's not an R bindings issue
   
   I guess even if there's something off with the file, the fact that it 
doesn't error is problematic.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to