[ https://issues.apache.org/jira/browse/AVRO-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862322#action_12862322 ]
Doug Cutting commented on AVRO-519: ----------------------------------- > Is there a place where 2.0 proposals are collected? Maybe a wiki page? Not yet. Feel free to start one. It would be good to have. > Efficient sparse optional fields support > ---------------------------------------- > > Key: AVRO-519 > URL: https://issues.apache.org/jira/browse/AVRO-519 > Project: Avro > Issue Type: New Feature > Components: spec > Reporter: John Plevyak > > One of the nice features of protobuf is efficient support for very sparse > optional fields, > for example large number of tags potentially associated with a document the > vast > majority of which are empty. > Avro does support optional fields as part of differing specifications, but > not on a per-record > level after a protocol has been agreed upon. Avro does have support for > arrays and maps > however both of these require homogeneous types. > I would suggest adding an additional field attribute: > * "optional" - with values "true"/"false" (where "false" is assumed) > For the encoding I would suggest that that any record which includes optional > fields > would be prefixed by an presence map which would be a sequence of int8 x* > where: > x > 0 : the lower 7 bits are presence bits for the next 7 optional fields > (low bit first) > -128 < x < 0 : the next present field is position x + 135 (as x runs from 0 > to -127 and the first 7 > must be empty otherwise we would use the x > 0 encoding) > x == -128: no optional fields present in the next 134 optional fields > x = 0 : end of sequence > further, if the map has covered all the options, the end-of-sequence marker > can be > elided. For example, a type with 3 optional fields would require only a > single byte. > This will permit encoding at 8/7 of a bit per present entry (worst case) and > at a cost of > 8/134 (0.06) bits/entry per all but last not-present (7.5 bytes / 1000 > optional fields). > This encoding is backward compatible as well as schema's which do not contain > optional > elements do not have the presence map and the encoding is therefore > identical. Backward > compatibility can be maintained by simply using the default value for > not-present fields. > Language APIs: > Efficient support could include either an explicit presence test or a > function which returns the value > or default value (if the field is not present). > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.