[GitHub] [jena] afs commented on issue #1324: Abysmal performance when parsing huge literal in query (e.g. 100MB)

GitBox Thu, 19 May 2022 12:39:51 -0700


afs commented on issue #1324:
URL: https://github.com/apache/jena/issues/1324#issuecomment-1132126541


   I don't see a PR on the javacc issue that is suitable. 
   
   There is an interesting suggestion about lexical states. ARQ only parses 
from strings, not streams, and only from data already already converted UTF-8. 
Access to the input would enable slicing literals direly out of the string.
   
   Rather than disrupt the existing processing, it could be done with a new 
token e.g. `X"...."`.
   
   USER_CHAR_STREAM is also an option.
   
   There is some investigation to do such as updating for Javacc 7.0 (the Jena 
codebase files were produced from JavaCC 6.0). #1328.
   
   FYI: The different parsers use different techniques to handle unicode and it 
is in some tests about surrogate pairs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [jena] afs commented on issue #1324: Abysmal performance when parsing huge literal in query (e.g. 100MB)

Reply via email to