[
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou reassigned ARROW-6762:
-------------------------------------
Assignee: Antoine Pitrou
> [C++] JSON reader segfaults on newline
> --------------------------------------
>
> Key: ARROW-6762
> URL: https://issues.apache.org/jira/browse/ARROW-6762
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Joris Van den Bossche
> Assignee: Antoine Pitrou
> Priority: Major
> Labels: json
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json
> ...: import pyarrow.parquet as pq
> ...:
> ...: r = json.read_json('SampleRecord.jl')
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93] Check failed:
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) ==
> (string_view::npos)
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json
> ...: import pyarrow.parquet as pq
> ...:
> ...: r = json.read_json('SampleRecord.jl')
>
>
> In [25]: r
>
>
> Out[25]:
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list<item: struct<movement: string, product: string, volume: string,
> volume_unit: string, buyer: null, seller: null>>
> child 0, item: struct<movement: string, product: string, volume: string,
> volume_unit: string, buyer: null, seller: null>
> child 0, movement: string
> child 1, product: string
> child 2, volume: string
> child 3, volume_unit: string
> child 4, buyer: null
> child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct<beam: null, build_year: null, call_sign: null, dead_weight:
> null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo:
> string, length: int64, mmsi: null, name: string, type: null, vessel_type:
> null>
> child 0, beam: null
> child 1, build_year: null
> child 2, call_sign: null
> child 3, dead_weight: null
> child 4, dwt: null
> child 5, flag_code: null
> child 6, flag_name: null
> child 7, gross_tonnage: null
> child 8, imo: string
> child 9, length: int64
> child 10, mmsi: null
> child 11, name: string
> child 12, type: null
> child 13, vessel_type: null
> In [26]: pa.__version__
>
>
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)