[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

Doug Cutting (JIRA) Mon, 06 May 2013 11:56:17 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649984#comment-13649984
 ]


Doug Cutting commented on AVRO-1316:
------------------------------------

We might optimize the parse methods a bit so that:
 - calls with a single string don't copy that string;
 - calls with multiple strings are not quadratic.

This might look something like:
{code}
public Schema parse(String json, String... moreJson) {
  if (moreJson.length > 0) {
    StringBuilder b = new StringBuilder(json);
    for (String s : moreJson)
      b.append(s);
    json = b.toString();
  }
  ...
}
{code}
and similarly for Protocol.

Also, with varargs we can get rid of the _part variables, since the string is 
not a compile-time constant, so the template could contain just something like:

{code}
public static final org.apache.avro.Schema SCHEMA$ 
  = new 
org.apache.avro.Schema.Parser().parse(${this.javaSplit($schema.toString())});
{code}

Where javaSplit is defined to split, add escapes to each chunk, then insert 
commas & quotes.  That would minimize template logic, making it simpler for 
folks who have alternate templates.

The javaSplit logic will be simpler if we split before escaping.  We just need 
to be sure to split into small enough chunks that escapes, UTF-8, etc. won't 
cause them to pass the 64k limit.  We might choose something as low as 1k to be 
safe.  We can then loop calling substring() to break out chunks.  As we create 
each chunk we can append them to a StringBuilder that's initialized with the 
opening quote, then insert quote-comma-quote between each chunk and add a final 
quote at the end.
                
> IDL code-generation generates too-long literals for very large schemas
> ----------------------------------------------------------------------
>
>                 Key: AVRO-1316
>                 URL: https://issues.apache.org/jira/browse/AVRO-1316
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Jeremy Kahn
>            Priority: Minor
>              Labels: patch
>         Attachments: AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch
>
>
> When I work from a very large IDL schema, the Java code generated includes a 
> schema JSON literal that exceeds the length of the maximum allowed literal 
> string ([65535 
> characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
>   
> This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
> constant string too long}}.
> It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
> outrageous at all for some of the more involved data structures, especially 
> if we're including documentation strings etc.
> I believe the fix should be a bit more sensitivity to the length of the JSON 
> literal (and a willingness to split it into more than one literal, joined by 
> {{+}}), but I haven't figured out where that change needs to go. Has anyone 
> else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

Reply via email to