The following seems like good news... like I should be able to decompress
just one column of a RecordBatch in the middle of a compressed feather v2
file. Is there a Python API for this kind of access? C++?
/// Provided for forward compatibility in case we need to support different
/// strategies for compressing the IPC message body (like whole-body
/// compression rather than buffer-level) in the future
enum BodyCompressionMethod:byte {
/// Each constituent buffer is first compressed with the indicated
/// compressor, and then written with the uncompressed length in the
first 8
/// bytes as a 64-bit little-endian signed integer followed by the
compressed
/// buffer bytes (and then padding as required by the protocol). The
/// uncompressed length may be set to -1 to indicate that the data that
/// follows is not compressed, which can be useful for cases where
/// compression does not yield appreciable savings.
BUFFER
}
On Wed, Sep 21, 2022 at 7:03 PM John Muehlhausen <[email protected]> wrote:
> ``Internal structure supports random access and slicing from the middle.
> This also means that you can read a large file chunk by chunk without
> having to pull the whole thing into memory.''
> https://ursalabs.org/blog/2020-feather-v2/
>
> For a compressed v2 file, can I decompress just one column of a batch in
> the middle, or is the entire batch with all of its columns compressed as a
> unit?
>
> Unfortunately reader.get_batch(i) seems like it is doing a lot of work.
> Like maybe decompressing all the columns?
>
> Thanks,
> John
>