[jira] Commented: (AVRO-91) add json codec in python

Thiruvalluvan M. G. (JIRA) Mon, 31 Aug 2009 10:13:55 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749545#action_12749545
 ]


Thiruvalluvan M. G. commented on AVRO-91:
-----------------------------------------

One drawback of having {read,write}{Record,Union}{Start,End} methods is that 
all clients that use decoder/encoder will have to generate these calls. This 
could be cumbersome for the clients and/or have performance impact.

Here is an approach which is not as complicated as the Java implementation of 
the parser. This parser is not as efficient as the one implemented in Java. But 
I guess performance is not vital for Json encoder/decoder as their main purpose 
is for diagnostics and debugging.

Here I describe the Encoder, but the idea can be implemented for the decoder as 
well.

The JSON encoder has a stack of "Markers". Markers are of these types - SCHEMA, 
RECORD_START, RECORD_END, FIELD, ARRAY_START, ARRAYEND, MAP_START, MAP_END, 
REPEATER etc. The SCHEMA marker will have a schema object associated with it. 
REPEATER marker has one or two schema objects associated with it. The FIELD 
marker has the field-name and the field-number associated with it.

The method writeBoolean() will call advance(schema.BOOLEAN) before writing 
"true" or "false" into the underlying stream. Similarly writeInt() will call 
advance(schema.INT) before writing the decimal string corresponding to the int 
into the underlying stream. Other write() methods for primitive types call 
advance() with an appropriate schema type.

The advance() method looks at the top of the stack, if the top of the stack is 
a SCHEMA marker and the schema matches the type passed to the advance(), then 
it simply pops the top element in the stack and returns. If the top of the 
stack is a SCHEMA marker, but the schema type is a compound type (such as a 
record, map or array) then it "expands" the top element (see below). If the top 
element is a SCHEMA marker, and the schema is non-compound type and it does not 
match the argument type of advance(), it is an error. If the top element is not 
a SCHEMA marker, it inserts appropriate text into the output stream. For 
example, if it is a RECORD_START or MAP_START a open-brace is written. 
Similarly, it it is a ARRAY_START a open square-bracket is written. If it is a 
FIELD marker, the field name associated with that field is written followed by 
a colon.

The expand() operation pops the top of the stack and replaces with the 
expansion of that marker. Only SCHEMA markers with compound schema types or 
REPEATER markers get expanded. The RECORD SCHEMA marker gets expanded to a  
sequence [RECORD_START, <FIELD, SCHEMA>*, RECORD_END]. The number of FIELD, 
SCHEMA pairs is the same as the number of fields of the record. The expanded 
sequence is pushed in the reverse order; that is RECORD_START will be at the 
top of the stack after expansion. Array SCHEMA marker gets expanded to 
{ARRAY_START, REPEATER, ARRAY_END }. The REPEATER has the schema of the 
element-type of the array. Map SCHEMA marker gets expanded to {MAP_START, 
REPEATER, MAP_END}; the REPEATER will have a string and a schema for the value 
of the map.

Expanding a union is somewhat different. It replaces the union SCHEMA marker 
with a SCHEMA marker for the appropriate branch. REPEATER marker is expanded to 
{ SCHEMA, REPEATER } or { SCHEMA, SCHEMA, REPEATER} where the SCHEMAs are the 
contents of the REPEATER. On reaching the end of array/map, the REPEATER marker 
at the top of the stack get discarded.

The above should take care of all aspects of Json encoding except the commas 
that should appear between fields in a record, or elements in array/map. The 
field number field of FIELD marker can be used to decide if a comma needs to be 
inserted. Some additional information can be kept in REPEATER to decide if a 
comma is needed in arrays/maps.

> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and 
> decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-91) add json codec in python

Reply via email to