[ 
https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-6762:
-------------------------------------

    Assignee: Antoine Pitrou

> [C++] JSON reader segfaults on newline
> --------------------------------------
>
>                 Key: ARROW-6762
>                 URL: https://issues.apache.org/jira/browse/ARROW-6762
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: json
>
> Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that 
> trying to read this file on master results in a segfault:
> {code}
> In [1]: from pyarrow import json 
>    ...: import pyarrow.parquet as pq 
>    ...:  
>    ...: r = json.read_json('SampleRecord.jl') 
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F1002 09:56:55.362766 13035 reader.cc:93]  Check failed: 
> (string_view(*next_partial).find_first_not_of(" \t\n\r")) == 
> (string_view::npos) 
> *** Check failure stack trace: ***
> Aborted (core dumped)
> {code}
> while with 0.14.1 this works fine:
> {code}
> In [24]: from pyarrow import json 
>     ...: import pyarrow.parquet as pq 
>     ...:  
>     ...: r = json.read_json('SampleRecord.jl')                                
>                                                                               
>                                                        
> In [25]: r                                                                    
>                                                                               
>                                                        
> Out[25]: 
> pyarrow.Table
> _type: string
> provider_name: string
> arrival: timestamp[s]
> berthed: timestamp[s]
> berth: null
> cargoes: list<item: struct<movement: string, product: string, volume: string, 
> volume_unit: string, buyer: null, seller: null>>
>   child 0, item: struct<movement: string, product: string, volume: string, 
> volume_unit: string, buyer: null, seller: null>
>       child 0, movement: string
>       child 1, product: string
>       child 2, volume: string
>       child 3, volume_unit: string
>       child 4, buyer: null
>       child 5, seller: null
> departure: timestamp[s]
> eta: null
> installation: null
> port_name: string
> next_zone: null
> reported_date: timestamp[s]
> shipping_agent: null
> vessel: struct<beam: null, build_year: null, call_sign: null, dead_weight: 
> null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: 
> string, length: int64, mmsi: null, name: string, type: null, vessel_type: 
> null>
>   child 0, beam: null
>   child 1, build_year: null
>   child 2, call_sign: null
>   child 3, dead_weight: null
>   child 4, dwt: null
>   child 5, flag_code: null
>   child 6, flag_name: null
>   child 7, gross_tonnage: null
>   child 8, imo: string
>   child 9, length: int64
>   child 10, mmsi: null
>   child 11, name: string
>   child 12, type: null
>   child 13, vessel_type: null
> In [26]: pa.__version__                                                       
>                                                                               
>                                                        
> Out[26]: '0.14.1'
> {code}
> cc [~apitrou] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to