[ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648076#comment-13648076
 ] 

Micah Huff commented on AVRO-1316:
----------------------------------

That makes sense about needing to accommodate unicode 4-byte characters. In 
making this change I found an interesting edge case to this patch. Because I am 
splitting arbitrarily on number of bytes, we can end up with this using the 
current patch (note that I changed partX to _partX to denote a field that 
shouldn't be referenced directly):
{code}
public static final String _part1 = "....\";
public static final String _part2 = ""SOME ESCAPED VALUE\"";
{code}

I need to modify this patch to detect an escape character as the last character 
in the string and shift the bytes as appropriate. That said, I probably also 
need to find a way to detect that possibly, for the 4-byte unicode characters 
(or even 2-byte characters) that I chopped the byte array off part-way through 
those byte sequences. I'll work on this addendum to the patch tonight / 
tomorrow.
                
> IDL code-generation generates too-long literals for very large schemas
> ----------------------------------------------------------------------
>
>                 Key: AVRO-1316
>                 URL: https://issues.apache.org/jira/browse/AVRO-1316
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Jeremy Kahn
>            Priority: Minor
>              Labels: patch
>         Attachments: AVRO-1316.patch
>
>
> When I work from a very large IDL schema, the Java code generated includes a 
> schema JSON literal that exceeds the length of the maximum allowed literal 
> string ([65535 
> characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
>   
> This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
> constant string too long}}.
> It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
> outrageous at all for some of the more involved data structures, especially 
> if we're including documentation strings etc.
> I believe the fix should be a bit more sensitivity to the length of the JSON 
> literal (and a willingness to split it into more than one literal, joined by 
> {{+}}), but I haven't figured out where that change needs to go. Has anyone 
> else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to