[ 
https://issues.apache.org/jira/browse/DAFFODIL-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776188#comment-17776188
 ] 

Steve Lawrence commented on DAFFODIL-2851:
------------------------------------------

As evidence that this is a real problem, I reduced the 
"maximumSimpleElementSizeInCharacters" tunable from 1048576 to 50 and saw an 
increase in performance of about 20%+. Granted the file I'm am testing is sort 
of a perfect case for this improvement--it is a very small file (66 bytes) and 
that file is all integers except for a single 7 character string. So parsing 
this file allocates a huge char buffer once per parse and never uses it again. 
In a more complex file with many more strings, this buffer would be reused. But 
I imagine we still would rarely need a 1MB char buffer. With this change, this 
char buffer allocation disappears from profiling, and is just amongst the noise.

> Excessive alloations in StringOfSpecifiedLengthMixin
> ----------------------------------------------------
>
>                 Key: DAFFODIL-2851
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2851
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>            Reporter: Steve Lawrence
>            Priority: Major
>
> The StringOfSpecifiedLengthMixin passes in the value of the 
> "maximumSimpleElementSizeInCharacters" tunable to the getSomeString function:
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala#L89-L94
> The getSomeString function calls withLocalCharBuffer which allocates a char 
> buffer of that size where it will decode the string. Currently, the tunable 
> defaults to 1MB. This size is pretty large, large enough to be a noticeable 
> contributor to allocations and cpu usage when profiling.
> Fortunately, the allocated char buffer is cached and reused during the parse 
> (though each parse allocates a new one), so it's only a one time penalty per 
> parse. But most files are not going to have single strings nearly that large 
> so this large allocation is just a waste.
> We should consider ways to reduce this allocation. Maybe simply decrease the 
> tunable? Or maybe change the logic so StringOfSpecifiedLength allocates a 
> much smaller amount, and grows the buffer if needed, maybe taking into 
> account bitLimit? Or maybe the buffer is shared among different parses in a 
> ThreadLocal, so we still allocate a large buffer, but the penalty is only 
> once per thread instead of once per parse? Likely other options...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to