DrChainsaw commented on PR #442:
URL: https://github.com/apache/arrow-julia/pull/442#issuecomment-1558629696

   Thanks alot for this!
   
   In case it is helpful, here is the java code which determines whether to set 
the -1 flag or not: 
https://github.com/apache/arrow/blob/fbe5f641d327ee81db00ce5f056940a69f4d8603/java/vector/src/main/java/org/apache/arrow/vector/compression/AbstractCompressionCodec.java#L42-L53
   
   The tl;dr is that they check whether the size after compression is larger 
than the uncompressed data. Since this can be different for different columns 
you can end up with a table with a mixture of compressed and non-compressed 
columns. 
   
   I suppose this is an optimization that the Julia writer could implement as 
well given that it seems like it is out there. I have no idea what the 
potential gains are though.
   
   I have searched the "Specification and Protocols" section of the Arrow docs 
for rules on how to set the length when applying compression but I could not 
find anything. If you happen to know where it is specified I would be happy to 
take a look since it might help with the [other 
issue](https://github.com/apache/arrow-julia/issues/437) I have encountered 
when reading files generated by the java implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to