[ 
https://issues.apache.org/jira/browse/AVRO-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050739#comment-14050739
 ] 

Doug Cutting commented on AVRO-1533:
------------------------------------

It won't generate runtime errors for invalid UTF-8, but instead replaces 
erroneous sequences with the character "�":

http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#String(byte[],%20java.nio.charset.Charset)

I think can be considered a compatible change, since it won't break existing 
applications.  Today attempts to switch a field from bytes to string would 
fail.  I suppose an application could currently rely on such failures, but I 
consider that unlikely enough that I'm willing to ignore it.  Do others 
disagree?

We could:
 # revert this change entirely, declaring it incompatible
 # revert just the change to the specification, so that Avro Java is more 
lenient in what conversions it permits than the specification (following 
Postel's law)
 # file issues to update the AVRO-1315 schema validation to permit such 
conversions
 - also file issues for C, C++ and C# to update their schema resolution to 
support these conversions

Thoughts?

> permit promotions between string and bytes
> ------------------------------------------
>
>                 Key: AVRO-1533
>                 URL: https://issues.apache.org/jira/browse/AVRO-1533
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1533.patch, AVRO-1533.patch
>
>
> Avro strings are a subset of bytes, so promoting from string to bytes is 
> lossless and should be possible.  Promotion from bytes to strings may cause 
> problems, as not all byte strings are valid UTF8, but it also might be useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to