[ https://issues.apache.org/jira/browse/IMPALA-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907119#comment-16907119 ]
ASF subversion and git services commented on IMPALA-4551: --------------------------------------------------------- Commit 1908e44c3c9faac8c7bf09422ca4c5ec598ffd58 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1908e44 ] IMPALA-4551: Limit the size of SQL statements Various BI tools generate and run SQL. When used incorrectly or misconfigured, the tools can generate extremely large SQLs. Some of these SQL statements reach 10s of megabytes. Large SQL statements impose costs throughout execution, including statement rewrite logic in the frontend and codegen in the backend. The resource usage of these statements can impact the stability of the system or the ability to run other SQL statements. This implements two new query options that provide controls to reject large SQL statements. - The first, MAX_STATEMENT_LENGTH_BYTES is a cap on the total size of the SQL statement (in bytes). It is applied before any parsing or analysis. It uses a default value of 16MB. - The second, STATEMENT_EXPRESSION_LIMIT, is a limit on the total number of expressions in a statement or any views that it references. The limit is applied upon the first round of analysis, but it is not reapplied when statement rewrite rules are applied. Certain expressions such as literals in IN lists or VALUES clauses are not analyzed and do not count towards the limit. It uses a default value of 250,000. The two are complementary. Since enforcing the statement expression limit requires parsing and analyzing the statement, the MAX_STATEMENT_LENGTH_BYTES sets an upper bound on the size of statement that needs to be parsed and analyzed. Testing confirms that even statements approaching 16MB get through the first round of analysis within a few seconds and then are rejected. This also changes the logging in tests/common/impala_connection.py to limit the total SQL size that it will print to 128KB. This is prevents the JUnitXML (which includes this logging) from being too large. Existing tests do not run SQL larger than about 80KB, so this only applies to tests added in this change that run multi-MB SQLs to verify limits. Testing: - This adds frontend tests that verify the low level semantics about how expressions are counted and verifies that the expression limits are enforced. - This adds end-to-end tests that verify both the MAX_STATEMENT_LENGTH_BYTES and STATEMENT_EXPRESSION_LIMIT at their defaults values. - There is also an end-to-end test that runs in exhaustive mode that runs a SQL with close to 250,000 expressions. Change-Id: I5675fb4a08c1dc51ae5bcf467cbb969cc064602c Reviewed-on: http://gerrit.cloudera.org:8080/14012 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Set limits on size of expression trees > -------------------------------------- > > Key: IMPALA-4551 > URL: https://issues.apache.org/jira/browse/IMPALA-4551 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.8.0 > Reporter: Tim Armstrong > Assignee: Joe McDonnell > Priority: Major > Attachments: huge_case.patch > > > Very large expression trees can cause havoc in various Impala components. I > have been experimenting with the attached test that generates large case > statements of varying depths and widths, and have been able to hit limits in > the frontend (Java OOM) and caused various runaway memory usage problems in > the backend (thrift structures, LLVM IR, codegen, etc). > We should set some kind of limit here, either on the number of nodes in the > expression trees, or on the size of the query text, and then make sure that > we can execute queries of the maximum size end-to-end. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org