Steve Lawrence created DAFFODIL-2851:
----------------------------------------

             Summary: Excessive alloations in StringOfSpecifiedLengthMixin
                 Key: DAFFODIL-2851
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2851
             Project: Daffodil
          Issue Type: Bug
          Components: Back End
            Reporter: Steve Lawrence


The StringOfSpecifiedLengthMixin passes in the value of the 
"maximumSimpleElementSizeInCharacters" tunable to the getSomeString function:

https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala#L89-L94

The getSomeString function calls withLocalCharBuffer which allocates a char 
buffer of that size where it will decode the string. Currently, the tunable 
defaults to 1MB. This size is pretty large, large enough to be a noticeable 
contributor to allocations and cpu usage when profiling.

Fortunately, the allocated char buffer is cached and reused during the parse 
(though each parse allocates a new one), so it's only a one time penalty per 
parse. But most files are not going to have single strings nearly that large so 
this large allocation is just a waste.

We should consider ways to reduce this allocation. Maybe simply decrease the 
tunable? Or maybe change the logic so StringOfSpecifiedLength allocates a much 
smaller amount, and grows the buffer if needed, maybe taking into account 
bitLimit? Or maybe the buffer is shared among different parses in a 
ThreadLocal, so we still allocate a large buffer, but the penalty is only once 
per thread instead of once per parse? Likely other options...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to