Mark Jarvin created SPARK-47404: ----------------------------------- Summary: Add hooks to release the ANTLR DFA cache after parsing SQL Key: SPARK-47404 URL: https://issues.apache.org/jira/browse/SPARK-47404 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Mark Jarvin
ANTLR builds a DFA cache while parsing to speed up parsing of similar future inputs. However, this cache is never cleared and can only grow. Extremely large SQL inputs can lead to very large DFA caches (>20GiB in one extreme case I've seen). Spark’s ANTLR SQL parser is derived from the Presto ANTLR SQL Parser, and Presto has added hooks to be able to clear this DFA cache. I think Spark should have similar hooks. References: * [https://github.com/antlr/antlr4/blob/f08a19bbb202b02a521f84d99e661e386bea8625/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L163-L171] * [https://stackoverflow.com/questions/28017135/why-antlr4-parsers-accumulates-atnconfig-objects?rq=2] * [https://github.com/antlr/antlr4/issues/499] * [https://github.com/trinodb/trino/pull/3186/files#diff-75b81ed5837578d1af42fcc91e4094a247138e5da6edb9d9e4b67d53247b8ca9] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org