[
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213199#comment-16213199
]
Wes McKinney edited comment on ARROW-1693 at 10/20/17 8:40 PM:
---------------------------------------------------------------
That's interesting. I'm surprised that the integration tests work -- it must be
that Java is not using the buffer layout information. Frankly, having the JS
reader be so sensitive to the buffer layout is a source of brittleness. We do
not use this data in C++, it is there for informational purposes.
Because the buffer structure of record batches will contain only the indexes, I
think using the index buffer layout is the "right" answer, though nowhere are
we encoding the buffer layout for the dictionary values. So this is a
deficiency in the Arrow format that we may contemplate fixing.
was (Author: wesmckinn):
That's interesting. I'm surprised that the integration tests work -- it must be
that Java is not using the buffer layout information. Frankly, having the
reader be so sensitive to the buffer layout is a source of brittleness. We do
not use this data in C++, it is there for informational purposes.
Because the buffer structure of record batches will contain only the indexes, I
think using the index buffer layout is the "right" answer, though nowhere are
we encoding the buffer layout for the dictionary values. So this is a
deficiency in the Arrow format that we may contemplate fixing.
> [JS] Error reading dictionary-encoded integration test files
> ------------------------------------------------------------
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
> Issue Type: Bug
> Components: JavaScript
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow,
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case;
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)