[
https://issues.apache.org/jira/browse/SPARK-20248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shaolinliu updated SPARK-20248:
-------------------------------
Description:
When we using thrift server, it is difficult to constrain the user's sql
statement;
When the user query a large table without limit, this will lead to thrift
server process memory occupancy lead to service instability;
In general, the user is not used correctly, because if you really need to
return the whole table:
1, if you use this data to compute , you can complete the computation in the
cluster and then return
2, if you want obtain the data, you can store it in hdfs
For the above scene, it is recommended to add a
"spark.sql.thriftserver.retainedResults" parameter,
1, when it is 0, we don not restrict user's operation
2, when it is greater than 0, if user query with limit, we use user's
limit;if not we use this to limit query's result
Priority user's limit is because, if the user consider the limit, in general,
the user is aware of the exact meaning of this query
was:
When we using thrift server, it is difficult to constrain the user's sql
statement;
When the user query a large table without limit, this will lead to thrift
server process memory occupancy lead to service instability;
In general, the user is not used correctly, because if you really need to
return the whole table:
1, if you use this data to compute , you can complete the computation in the
cluster and then return
2, if you want obtain the data, you can store it in hdfs
For the above scene, it is recommended to add a
"spark.sql.thriftServer.retainedResults" parameter,
1, when it is 0, we don not restrict user's operation
2, when it is greater than 0, if user query with limit, we use user's
limit;if not we use this to limit query's result
Priority user's limit is because, if the user consider the limit, in general,
the user is aware of the exact meaning of this query
> Spark SQL add limit parameter to enhance the reliability.
> ---------------------------------------------------------
>
> Key: SPARK-20248
> URL: https://issues.apache.org/jira/browse/SPARK-20248
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.0
> Environment: 2.1.0
> Reporter: shaolinliu
> Priority: Minor
>
> When we using thrift server, it is difficult to constrain the user's sql
> statement;
> When the user query a large table without limit, this will lead to thrift
> server process memory occupancy lead to service instability;
> In general, the user is not used correctly, because if you really need to
> return the whole table:
> 1, if you use this data to compute , you can complete the computation in
> the cluster and then return
> 2, if you want obtain the data, you can store it in hdfs
> For the above scene, it is recommended to add a
> "spark.sql.thriftserver.retainedResults" parameter,
> 1, when it is 0, we don not restrict user's operation
> 2, when it is greater than 0, if user query with limit, we use user's
> limit;if not we use this to limit query's result
> Priority user's limit is because, if the user consider the limit, in
> general, the user is aware of the exact meaning of this query
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]