[ https://issues.apache.org/jira/browse/AVRO-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859897#action_12859897 ]
Doug Cutting commented on AVRO-519: ----------------------------------- > What is the rational for not permitting a name to be associated with other > types in a union? This is discussed in AVRO-248. One rationale is simply that it would be an incompatible change. Existing implementations should ignore the name, but they should also generate an error if a union has two "bytes" branches. A dynamic language needs a way at runtime to distinguish whether "a" or "b" is used. So one would need to wrap the bytes in something to indicate this. Like a record. Records can add a name to any type, with no serialized overhead. > Efficient sparse optional fields support > ---------------------------------------- > > Key: AVRO-519 > URL: https://issues.apache.org/jira/browse/AVRO-519 > Project: Avro > Issue Type: New Feature > Components: spec > Reporter: John Plevyak > > One of the nice features of protobuf is efficient support for very sparse > optional fields, > for example large number of tags potentially associated with a document the > vast > majority of which are empty. > Avro does support optional fields as part of differing specifications, but > not on a per-record > level after a protocol has been agreed upon. Avro does have support for > arrays and maps > however both of these require homogeneous types. > I would suggest adding an additional field attribute: > * "optional" - with values "true"/"false" (where "false" is assumed) > For the encoding I would suggest that that any record which includes optional > fields > would be prefixed by an presence map which would be a sequence of int8 x* > where: > x > 0 : the lower 7 bits are presence bits for the next 7 optional fields > (low bit first) > -128 < x < 0 : the next present field is position x + 135 (as x runs from 0 > to -127 and the first 7 > must be empty otherwise we would use the x > 0 encoding) > x == -128: no optional fields present in the next 134 optional fields > x = 0 : end of sequence > further, if the map has covered all the options, the end-of-sequence marker > can be > elided. For example, a type with 3 optional fields would require only a > single byte. > This will permit encoding at 8/7 of a bit per present entry (worst case) and > at a cost of > 8/134 (0.06) bits/entry per all but last not-present (7.5 bytes / 1000 > optional fields). > This encoding is backward compatible as well as schema's which do not contain > optional > elements do not have the presence map and the encoding is therefore > identical. Backward > compatibility can be maintained by simply using the default value for > not-present fields. > Language APIs: > Efficient support could include either an explicit presence test or a > function which returns the value > or default value (if the field is not present). > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.