[
https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749545#action_12749545
]
Thiruvalluvan M. G. commented on AVRO-91:
-----------------------------------------
One drawback of having {read,write}{Record,Union}{Start,End} methods is that
all clients that use decoder/encoder will have to generate these calls. This
could be cumbersome for the clients and/or have performance impact.
Here is an approach which is not as complicated as the Java implementation of
the parser. This parser is not as efficient as the one implemented in Java. But
I guess performance is not vital for Json encoder/decoder as their main purpose
is for diagnostics and debugging.
Here I describe the Encoder, but the idea can be implemented for the decoder as
well.
The JSON encoder has a stack of "Markers". Markers are of these types - SCHEMA,
RECORD_START, RECORD_END, FIELD, ARRAY_START, ARRAYEND, MAP_START, MAP_END,
REPEATER etc. The SCHEMA marker will have a schema object associated with it.
REPEATER marker has one or two schema objects associated with it. The FIELD
marker has the field-name and the field-number associated with it.
The method writeBoolean() will call advance(schema.BOOLEAN) before writing
"true" or "false" into the underlying stream. Similarly writeInt() will call
advance(schema.INT) before writing the decimal string corresponding to the int
into the underlying stream. Other write() methods for primitive types call
advance() with an appropriate schema type.
The advance() method looks at the top of the stack, if the top of the stack is
a SCHEMA marker and the schema matches the type passed to the advance(), then
it simply pops the top element in the stack and returns. If the top of the
stack is a SCHEMA marker, but the schema type is a compound type (such as a
record, map or array) then it "expands" the top element (see below). If the top
element is a SCHEMA marker, and the schema is non-compound type and it does not
match the argument type of advance(), it is an error. If the top element is not
a SCHEMA marker, it inserts appropriate text into the output stream. For
example, if it is a RECORD_START or MAP_START a open-brace is written.
Similarly, it it is a ARRAY_START a open square-bracket is written. If it is a
FIELD marker, the field name associated with that field is written followed by
a colon.
The expand() operation pops the top of the stack and replaces with the
expansion of that marker. Only SCHEMA markers with compound schema types or
REPEATER markers get expanded. The RECORD SCHEMA marker gets expanded to a
sequence [RECORD_START, <FIELD, SCHEMA>*, RECORD_END]. The number of FIELD,
SCHEMA pairs is the same as the number of fields of the record. The expanded
sequence is pushed in the reverse order; that is RECORD_START will be at the
top of the stack after expansion. Array SCHEMA marker gets expanded to
{ARRAY_START, REPEATER, ARRAY_END }. The REPEATER has the schema of the
element-type of the array. Map SCHEMA marker gets expanded to {MAP_START,
REPEATER, MAP_END}; the REPEATER will have a string and a schema for the value
of the map.
Expanding a union is somewhat different. It replaces the union SCHEMA marker
with a SCHEMA marker for the appropriate branch. REPEATER marker is expanded to
{ SCHEMA, REPEATER } or { SCHEMA, SCHEMA, REPEATER} where the SCHEMAs are the
contents of the REPEATER. On reaching the end of array/map, the REPEATER marker
at the top of the stack get discarded.
The above should take care of all aspects of Json encoding except the commas
that should appear between fields in a record, or elements in array/map. The
field number field of FIELD marker can be used to decide if a comma needs to be
inserted. Some additional information can be kept in REPEATER to decide if a
comma is needed in arrays/maps.
> add json codec in python
> ------------------------
>
> Key: AVRO-91
> URL: https://issues.apache.org/jira/browse/AVRO-91
> Project: Avro
> Issue Type: New Feature
> Components: python
> Reporter: Doug Cutting
> Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and
> decoders in Python.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.