Hi,

In variable-length types like strings, there is the stream kind "Data"
containing the concatenated values. When decoding to e.g. a vector of
strings, is there any constraint over whether compression breaks the values
boundaries?

I.e. say we have a string column with 2 rows, each with 100Mb each, [r1,r2]
(which are concatenated in "Data"). Can we end up with a compression where
r2 is split in between two compressions?

Is this also valid for the stream kind "Length"?

More broadly, the question is whether, when deserializing we need to
"concatenate" bytes from parts of the compressed items or whether we can
assume that compression respects row boundaries.

Best,
Jorge

Reply via email to