[
https://issues.apache.org/jira/browse/ARROW-15737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495810#comment-17495810
]
Jay Edwards commented on ARROW-15737:
-------------------------------------
I've found files that don't exhibit the behavior.
> pyarrow.parquet.read_table("parquet_file") causes bus error in ipython
> ----------------------------------------------------------------------
>
> Key: ARROW-15737
> URL: https://issues.apache.org/jira/browse/ARROW-15737
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 7.0.0
> Environment: macOS 12.2.1 aarch64
> python. 3.10.1
> arrow 7.0.0
> Reporter: Jay Edwards
> Priority: Major
> Labels: parquet, pyarrow
>
> I have a parquet file with two columns (int64 and double) and 9 million rows.
> The parquet tools (parquet, parquet-reader, parquet-schema...) read it
> perfectly. (I have many files, actually, but they all exhibit the same
> behavior).
> The following code fails with "zsh bus error ipython":
> import pyarrow.parquet as pq
> pq.read_table("parquet_file")
> These snippets work properly.
> pq.read_table("parquet_file", use_lagacy_dataset=True)
> f = pq.ParquetFile("parquet_file")
> f.read()
> for batch in f.iterbatches():
> print(len(batch))
--
This message was sent by Atlassian Jira
(v8.20.1#820001)