thisisnic commented on issue #46391: URL: https://github.com/apache/arrow/issues/46391#issuecomment-2873121222
I looked into this here: https://github.com/apache/arrow/discussions/46383 but will summarise key info below: * Parquet file is here: https://github.com/softcite/softcite-extractions-parquet-analysis/raw/refs/heads/main/data/softcite-extractions-oa-data/p01_one_percent_random_subset/papers.parquet * Rewriting the dataset (in R but without materialising it in the R session) results in the error disappearing so it looks like it's something strange with how the file was written or how it's being read * Reading the dataset as an Arrow Table results in the error disappearing so it looks like something to do with Datasets (or filter pushdown?) * I get the same results in Python and R, so it's not an R bindings issue I guess even if there's something off with the file, the fact that it doesn't error is problematic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
