Steinar Knutsen wrote:
Avro supports skip information,
but it
is somewhat inefficient to skip across a block of an array, a record or a
map, if any of these contain a variable length object. The headers only
contain the number of objects contained, not the length in bytes.
Arrays and maps can optionally encode their length in bytes. If the
item count is a negative number, then -count is the actual count and
immediately following the count is the size in bytes. Java's
BlockingBinaryEncoder implements this. All decoders must implement it.
http://hadoop.apache.org/avro/docs/current/api/java/org/apache/avro/io/BlockingBinaryEncoder.html
It does not emit a size for every array and map, but only for arrays and
maps whose contained size exceeds a threshold, so the overhead of adding
the size is limited. It also splits arrays and maps that are larger
than can be buffered as a whole into a sequence of blocks.
Doug