[
https://issues.apache.org/jira/browse/AVRO-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. updated AVRO-2128:
--------------------------------------
Component/s: java
> Schema parsing in the Java library is more permissive than the C
> implementation or the JSON specification
> ---------------------------------------------------------------------------------------------------------
>
> Key: AVRO-2128
> URL: https://issues.apache.org/jira/browse/AVRO-2128
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Reporter: Zoltan Ivanfi
> Priority: Major
>
> When parsing schemas, the Java library accepts C-style comments (which are
> forbidden in JSON) and is unaffected by trailing garbage (parsing stops as
> soon as it reaches the end of the JSON structure).
> In the C library, however, comments and trailing whitspaces cause an error.
> If a schema is accepted by one language binding, it should be accepted by the
> other as well. The schema should also be valid JSON. It's the Java library
> that does not enforce this by being more permissive than it should be, so it
> seems that the Java implementation should be changed. However, we must also
> consider whether making the Java library stricter at this point would make
> any existing data unreadable.
> Fortunately, the schema that is written in the data files themselves is
> always valid JSON, even if it is based on a non-JSON-conformant schema. The
> reason for this is that Java library parses the schema, build an in-memory
> representation and then reserializes that, thereby removing comments and
> trailing garbage. So existing data files are not affected, only user-supplied
> schemas. These can be manually updated (unlike existing data files).
> The real-world use-case where this discrepancy causes problems is Hive-Impala
> interaction. Users can create tables in Hive by supplying an Avro schema.
> That schema will be associated with the whole table by getting saved in the
> Hive metastore. Impala also consults this metadata when accessing the table
> and that causes an error in the Avro C library that Impala uses. This is
> detailed in IMPALA-1024. In particular, [this
> comment|https://issues.apache.org/jira/browse/IMPALA-1024?focusedCommentId=16261702&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16261702]
> contains a lot of relevant information.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)