On 03/15/2015 03:42 AM, Cheng Lian wrote:
I think the question Jianshi intended to ask is whether the null in {
'a': null } takes storage space, which I think is no. The key and value
parts of the map are treated as two separate columns in Parquet. So the
key 'a' takes space in the key column, while the value null doesn't take
space in the value column.

Cheng

Most of the time, the answer is that null will take very little space. The columns are encoded using definition levels, which takes care of tracking what level in a nested scheme is null. The simple case is where you have a value that may be null, in which case you have 2 possible levels: 0 (null) and 1 (an encoded value). The value is only encoded if the definition level is 1, but if it is possible for a column to be null there is at least 1 bit for each value, which is then itself run-length encoded and bit packed. So some space is required, but *very* little space is used.

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to