[ 
https://issues.apache.org/jira/browse/AVRO-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. updated AVRO-2128:
--------------------------------------
    Component/s: java

> Schema parsing in the Java library is more permissive than the C 
> implementation or the JSON specification
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2128
>                 URL: https://issues.apache.org/jira/browse/AVRO-2128
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Zoltan Ivanfi
>            Priority: Major
>
> When parsing schemas, the Java library accepts C-style comments (which are 
> forbidden in JSON) and is unaffected by trailing garbage (parsing stops as 
> soon as it reaches the end of the JSON structure).
> In the C library, however, comments and trailing whitspaces cause an error.
> If a schema is accepted by one language binding, it should be accepted by the 
> other as well. The schema should also be valid JSON. It's the Java library 
> that does not enforce this by being more permissive than it should be, so it 
> seems that the Java implementation should be changed. However, we must also 
> consider whether making the Java library stricter at this point would make 
> any existing data unreadable.
> Fortunately, the schema that is written in the data files themselves is 
> always valid JSON, even if it is based on a non-JSON-conformant schema. The 
> reason for this is that Java library parses the schema, build an in-memory 
> representation and then reserializes that, thereby removing comments and 
> trailing garbage. So existing data files are not affected, only user-supplied 
> schemas. These can be manually updated (unlike existing data files).
> The real-world use-case where this discrepancy causes problems is Hive-Impala 
> interaction. Users can create tables in Hive by supplying an Avro schema. 
> That schema will be associated with the whole table by getting saved in the 
> Hive metastore. Impala also consults this metadata when accessing the table 
> and that causes an error in the Avro C library that Impala uses. This is 
> detailed in IMPALA-1024. In particular, [this 
> comment|https://issues.apache.org/jira/browse/IMPALA-1024?focusedCommentId=16261702&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16261702]
>  contains a lot of relevant information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to