[
https://issues.apache.org/jira/browse/ARROW-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243121#comment-16243121
]
Li Jin commented on ARROW-1779:
-------------------------------
cc [~cpcloud] [~wesmckinn]
This is probably a Java issue but I am kind of stuck figuring out what's wrong
because the error happens in C++ integration validation (Java producing, C++
consuming). I have the file to reproduce this
{code:java}
>>> good =
>>> pyarrow.RecordBatchFileReader("/Users/ljin/workspace/arrow/nested.good")
^[[A
>>> good_batch = good.get_record_batch(1)
>>> good_batch.column(1)
<pyarrow.lib.StructArray object at 0x10c2a3548>
[
NA,
{'f1': None, 'f2': 'BSZRpGI'},
{'f1': None, 'f2': None},
{'f1': None, 'f2': None},
NA,
NA,
{'f1': None, 'f2': None},
{'f1': None, 'f2': None},
{'f1': 416507125, 'f2': None},
NA
]
{code}
{code:java}
>>> bad =
>>> pyarrow.RecordBatchFileReader("/Users/ljin/workspace/arrow/nested.bad")
>>> bad_batch = bad.get_record_batch(1)
>>> bad_batch.column(1)
<pyarrow.lib.StructArray object at 0x10c0c6b88>
[
{'f1': -1345581951, 'f2': None},
{'f1': None, 'f2': 'BSZRpGI'},
{'f1': None, 'f2': None},
{'f1': None, 'f2': None},
{'f1': -497925054, 'f2': 'E34Dqdr'},
{'f1': 94270936, 'f2': '5aksGEG'},
{'f1': None, 'f2': None},
{'f1': None, 'f2': None},
{'f1': 416507125, 'f2': None},
{'f1': None, 'f2': None}
]
{code}
They are supposed to have the same data but the bad one doesn't read validity
vector correctly. Can you guys help shed some light?
> [Java] Integration test breaks without zeroing out validity vectors
> -------------------------------------------------------------------
>
> Key: ARROW-1779
> URL: https://issues.apache.org/jira/browse/ARROW-1779
> Project: Apache Arrow
> Issue Type: Sub-task
> Reporter: Li Jin
> Fix For: 0.8.0
>
> Attachments: nested.bad, nested.good, nested.json
>
>
> This is discovered in https://github.com/apache/arrow/pull/1290
> I found one the integration test (nested) failed without zeroing out validity
> vectors before loading the array from json.
> I have created three files to reproduce this:
> (1) nested.json
> (2) nested.good (zeroing out validity vector before reading)
> (3) nested.bad (not zeroing out validity vector before reading)
> (1) / (2) and (1) / (3) both pass Java integration test, however (1) / (3)
> fails C++ test - one of the validity vector in (3) doesn't seem to be read
> correctly.
> I am not sure what the issue is because I cannot reproduce an error in Java.
> I am hoping maybe some one more familiar with C++ could take a look and give
> some insights what's the wrong with (3).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)