stevedlawrence opened a new pull request, #1134:
URL: https://github.com/apache/daffodil/pull/1134

   The current algorithm to remove right padding of left justified strings 
first reverses the String, removes leading pad characters using dropWhile, and 
then reverses the result. The two reverses are linear in the length of the 
String, and requires allocating multiple String instances and copying 
characters from one to the other. And this is done regardless of how many, if 
any, pad chars exist in the String. This logic is very clear, but is fairly 
inefficient, enough to show up while profiling.
   
   To improve performance, this rewrites the algorithm to scan through the 
String in reverse to find the index of the last pad character and then uses the 
substring() function to create a new String with those pad characters removed. 
This is now linear in the number of pad characters in a String instead of the 
full length of the string. Additionally, the use of substring() avoids 
character copies, since it just allocates a new String using the same 
underlying String value but with different indices.
   
   I have not looked into detail how scala implements dropWhile() for Strings 
(skimming the code, it looks like it will allocate a new String and copy 
characters), but for consistency and maximum performance, this also updates the 
algorithm that removes left padding of right justified strings to use similar 
logic as the new right padding algorithm. By using substring() we should avoid 
possible copies.
   
   In one test with lots of left justified strings, many of which are padded, 
this saw about a 15% improvement in parse times (excluding infoset creating 
using the null infoset outputter), and padding removal no longer shows up while 
profiling.
   
   DAFFODIL-2868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to