alamb opened a new issue #859:
URL: https://github.com/apache/arrow-rs/issues/859


   **Describe the bug**
   
`../arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary.arrow_file`
 contains a UTF8 Arrow array somewhere encoded in big endian. 
   
   When this is read in to the arrow-rs implementation, the offsets buffer 
remains big endian, even though the code assumes the offsets buffer has values 
in native endianness (e.g. the offsets of the created arrow-rs buffer incorrect 
on little endian machines like x86) 
   
   **To Reproduce**
   See test `read_dictionary_be_not_implemented` 
https://github.com/apache/arrow-rs/pull/810
   
   It fails with Length spanned by offsets in Utf8 (687865856) is larger than 
the values array size (41)
   
   **Expected behavior**
   The test should pass (likely by translating offsets from big endian to 
native endianness)
   
   **Additional context**
   Found while adding validation in https://github.com/apache/arrow-rs/pull/810


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to