Efficient sparse optional fields support
----------------------------------------

                 Key: AVRO-519
                 URL: https://issues.apache.org/jira/browse/AVRO-519
             Project: Avro
          Issue Type: New Feature
          Components: spec
            Reporter: John Plevyak


One of the nice features of protobuf is efficient support for very sparse 
optional fields,
for example large number of tags potentially associated with a document the vast
majority of which are empty.

Avro does support optional fields as part of differing specifications, but not 
on a per-record
level after a protocol has been agreed upon.  Avro does have support for arrays 
and maps
however both of these require homogeneous types.

I would suggest adding an additional field attribute:
   * "optional" - with values "true"/"false" (where "false" is assumed)

For the encoding I would suggest that that any record which includes optional 
fields
would be prefixed by an presence map which would be a sequence of int8 x* where:

  x > 0 : the lower 7 bits are presence bits for the next 7 optional fields 
(low bit first)
  -128 < x < 0 : the next present field is position x + 135 (as x runs from 0 
to -127 and the first 7
              must be empty otherwise we would use the x > 0 encoding) 
  x == -128: no optional fields present in the next 134 optional fields
  x = 0 : end of sequence
  further, if the map has covered all the options, the end-of-sequence marker 
can be
  elided.  For example, a type with 3 optional fields would require only a 
single byte. 

This will permit encoding at 8/7 of a bit per present entry (worst case) and at 
a cost of
8/134 (0.06) bits/entry per all but last not-present (7.5 bytes / 1000 optional 
fields).

This encoding is backward compatible as well as schema's which do not contain 
optional
elements do not have the presence map and the encoding is therefore identical.  
Backward
compatibility can be maintained by simply using the default value for 
not-present fields.

Language APIs:

Efficient support could include either an explicit presence test or a function 
which returns the value
or default value (if the field is not present).
 





-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to