[GitHub] [arrow-julia] DrChainsaw opened a new issue, #437: Failure to read compressed empty table from java implementation

via GitHub Wed, 17 May 2023 09:15:13 -0700


DrChainsaw opened a new issue, #437:
URL: https://github.com/apache/arrow-julia/issues/437


   The problem is that the java implementation sets the length field in 
RecordBatch to 8 bytes in this case. The consequence is that [this 
check](https://github.com/apache/arrow-julia/blob/e893c327f177f5a4d5efeab831df0fe93ab4ec5b/src/table.jl#L518)
 does nothing so the uncompression continues, finds `len=0` when reading the 
first 8 bytes from the pointer and passes a zero length array to `transcode` 
which then seems to hang indefinitely. 
   
   Doing the same type of check and return as for `buffer.length` on `len` 
resolves the issue for me. 
   
   It might very well be the java implementation which does something wrong 
here as I couldn't find any reference in the format specification on how to 
describe lengths when using compression. I have opened [an 
issue](https://github.com/apache/arrow/issues/35639) there as well. File 
generated by the java code in that issue: 
[randon_access_to_file.zip](https://github.com/apache/arrow-julia/files/11500356/randon_access_to_file.zip).
   
   Both pyarrow and the Julia implementation sets it to 0 when writing an empty 
table to disk.
   
   It does make some sense to set `buffer.length` to 8 and let the `len` field 
carry the information even though it is 8 bytes which could have been saved. 
Both the java implementation and pyarrow can read the attached file.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-julia] DrChainsaw opened a new issue, #437: Failure to read compressed empty table from java implementation

Reply via email to