Mark Jarvin created SPARK-47404:
-----------------------------------

             Summary: Add hooks to release the ANTLR DFA cache after parsing SQL
                 Key: SPARK-47404
                 URL: https://issues.apache.org/jira/browse/SPARK-47404
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Mark Jarvin


ANTLR builds a DFA cache while parsing to speed up parsing of similar future 
inputs. However, this cache is never cleared and can only grow. Extremely large 
SQL inputs can lead to very large DFA caches (>20GiB in one extreme case I've 
seen).

Spark’s ANTLR SQL parser is derived from the Presto ANTLR SQL Parser, and 
Presto has added hooks to be able to clear this DFA cache. I think Spark should 
have similar hooks.

References:
 * 
[https://github.com/antlr/antlr4/blob/f08a19bbb202b02a521f84d99e661e386bea8625/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L163-L171]

 * 
[https://stackoverflow.com/questions/28017135/why-antlr4-parsers-accumulates-atnconfig-objects?rq=2]

 * [https://github.com/antlr/antlr4/issues/499]

 * 
[https://github.com/trinodb/trino/pull/3186/files#diff-75b81ed5837578d1af42fcc91e4094a247138e5da6edb9d9e4b67d53247b8ca9]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to