[ https://issues.apache.org/jira/browse/AVRO-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905980#action_12905980 ]
Doug Cutting commented on AVRO-654: ----------------------------------- Note that full, recursive validation is not required for union dispatch. http://avro.apache.org/docs/1.3.3/spec.html#Unions So a typical implementation of a union writer might look something like: {code} writeUnion(datum, union) { int index = -1; for (int i = 0; index ==-1 && i < union.length; i++) { case (union[i].type) { INT : if (datum is int) { index = i; break; } INT : if (datum is long) index = i; break; } ... other unnamed types ... RECORD: if (datum is record) && datum.name.equals(union[i].name) { index = i; break; } ... other named types ... } writeInt(index); write(datum, union[index]); } {code} > Recursive #validate() for union'ed schemas in Ruby cripples performance > ----------------------------------------------------------------------- > > Key: AVRO-654 > URL: https://issues.apache.org/jira/browse/AVRO-654 > Project: Avro > Issue Type: Bug > Components: ruby > Affects Versions: 1.3.3 > Reporter: Philip (flip) Kromer > > The ruby DatumWriter calls #validate() on each #write(). In the case of a > schema with many nested unions (cf. Cassandra's*), this requires a recursive > depth-first search to determine which branch to take. In ruby, these > operations are very expensive -- enough to limit write speeds to 2k/sec on a > machine of moderate size. > For repeated writing of the same data structure, one idea would be to create > a CompiledDatumWriter. This would walk through the validation and assemble an > tree of the methods to apply to each schema element in turn: > [ [:write_long 'id'], [:write_bytes, 'name'], [:write_record, 'address', > [:write_long, 'street']] ] > --- > * > http://github.com/infochimps/cassandra/blob/beta1_plus_patches/interface/avro/cassandra.avpr -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.