[ 
https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-672:
------------------------------

    Attachment: AVRO-672.patch

It might be confusing to provide two different JSON encodings for Avro data.  
Also, the encoding in your patch is indeed simpler, but can lose information.  
For example, a string that looks like base64-encoded binary data would be 
assumed by Jackson to be binary data, which might not always be the case.  
Schemas that include fixed or enum values are not supported by this encoding, 
nor are many unions.

If reading and writing arbitrary JSON is a priority, then the approach taken in 
AVRO-251 might be of interest.  Here's a patch that provides a DatumReader and 
DatumWriter for Jackson's JsonNode.  This uses a schema that permits arbitrary 
JSON data.  Would this be useful to you?  If so, we could provide it as a tool.

> Convert JSON Text Input to Avro Tool
> ------------------------------------
>
>                 Key: AVRO-672
>                 URL: https://issues.apache.org/jira/browse/AVRO-672
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Ron Bodkin
>         Attachments: AVRO-672.patch, AVRO-672.patch
>
>
> The attached patch allows reading a JSON-formatted text file in, converting 
> to a conforming Avro text file, emitting one record per line, e.g., it can 
> read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [ 
> {"name":"intval","type":"int"}, {"name":"strval","type":["string", "null"]}]}
> returning valid Avro. This is different than the DataFileWriteTool, which 
> would read in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading 
> in JSON text that appears in the wild. Likewise, this utility allows changing 
> invalid Avro identifier characters into an underscore, again to tolerate JSON 
> that wasn't designed to be readable by Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to