[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213247#comment-16213247
 ] 

Paul Taylor commented on ARROW-1693:
------------------------------------

[~bhulette] thanks I can dig into this too. FYI there's a workaround for 
Dictionary arrows written before ARROW-1363 was fixed 
[here|https://github.com/apache/arrow/blob/a8f518588fda471b2e3cc8e0f0064e7c4bb99899/js/src/reader/vector.ts#L64]
 and 
[here|https://github.com/apache/arrow/blob/a8f518588fda471b2e3cc8e0f0064e7c4bb99899/js/src/reader/vector.ts#L108].

If the Java writer encodes a data buffer that isn't the offsets, that totally 
invalidates this assumption in the comment under line 108:

{quote}
...if we're parsing an Arrow file written by a version of the library published 
before ARROW-1363 was fixed, the IntVector's data buffer will be null, and the 
offset buffer will be the actual data. If data is null, it's safe to assume the 
offset buffer is the data, because IntVectors don't have offsets.
{quote}

> [JS] Error reading dictionary-encoded integration test files
> ------------------------------------------------------------
>
>                 Key: ARROW-1693
>                 URL: https://issues.apache.org/jira/browse/ARROW-1693
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>             Fix For: 0.8.0
>
>         Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to