[
https://issues.apache.org/jira/browse/ARROW-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433122#comment-17433122
]
Weston Pace commented on ARROW-14439:
-------------------------------------
This sounds a bit like ARROW-13871. I did verify that your example fails on
5.0.0 but it does not fail on the current 6.0.0 code. Can you verify this when
6.0.0 releases (we are in the process of releasing 6.0.0 now)
> [Python][C++] Segfault with read_json when a field is missing
> -------------------------------------------------------------
>
> Key: ARROW-14439
> URL: https://issues.apache.org/jira/browse/ARROW-14439
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 5.0.0
> Reporter: quentin lhoest
> Priority: Major
>
> When reading a JSON Lines file, a segfault can happen if there's a missing
> field at one point.
> In particular when the missing field is supposed to be a list, and if the
> block size is small enough.
> Here is an example to reproduce:
> {code:python}
> import io
> import pyarrow.json as paj
> batch = b'{"a": [], "b": 1}\n{"b": 1}'
> block_size = 12
> paj.read_json(
> io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
> )
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)