[ https://issues.apache.org/jira/browse/AVRO-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859878#action_12859878 ]
John Plevyak commented on AVRO-519: ----------------------------------- Doug, your proposed solution is made somewhat more complex by the fact that it is not possible to associate a name with types other than records, fixed and enum within a union. One might want to do: { "type" : "array", "name" : "optionals", "items" : [ { "name" : "a", "type" : "bytes" }, { "name" : "b", "type" : "bytes" } ] } which the C++ translator accepts but for which it nevertheless generates incorrect code (I will file a bug). As it stands, one would have to do: { "type" : "array", "name" : "optionals", "items" : [ { "name" : "l", "type" : "record", "fields" : [ { "name" : "l", "type": "long"} ] }, { "name" : "r", "type" : "record", "fields" : [ { "name" : "r", "type": "long"} ] } ] } which is workable, albeit more complicated than one might want. What is the rational for not permitting a name to be associated with other types in a union? > Efficient sparse optional fields support > ---------------------------------------- > > Key: AVRO-519 > URL: https://issues.apache.org/jira/browse/AVRO-519 > Project: Avro > Issue Type: New Feature > Components: spec > Reporter: John Plevyak > > One of the nice features of protobuf is efficient support for very sparse > optional fields, > for example large number of tags potentially associated with a document the > vast > majority of which are empty. > Avro does support optional fields as part of differing specifications, but > not on a per-record > level after a protocol has been agreed upon. Avro does have support for > arrays and maps > however both of these require homogeneous types. > I would suggest adding an additional field attribute: > * "optional" - with values "true"/"false" (where "false" is assumed) > For the encoding I would suggest that that any record which includes optional > fields > would be prefixed by an presence map which would be a sequence of int8 x* > where: > x > 0 : the lower 7 bits are presence bits for the next 7 optional fields > (low bit first) > -128 < x < 0 : the next present field is position x + 135 (as x runs from 0 > to -127 and the first 7 > must be empty otherwise we would use the x > 0 encoding) > x == -128: no optional fields present in the next 134 optional fields > x = 0 : end of sequence > further, if the map has covered all the options, the end-of-sequence marker > can be > elided. For example, a type with 3 optional fields would require only a > single byte. > This will permit encoding at 8/7 of a bit per present entry (worst case) and > at a cost of > 8/134 (0.06) bits/entry per all but last not-present (7.5 bytes / 1000 > optional fields). > This encoding is backward compatible as well as schema's which do not contain > optional > elements do not have the presence map and the encoding is therefore > identical. Backward > compatibility can be maintained by simply using the default value for > not-present fields. > Language APIs: > Efficient support could include either an explicit presence test or a > function which returns the value > or default value (if the field is not present). > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.