[jira] [Updated] (SPARK-20248) Spark SQL add limit parameter to enhance the reliability.

shaolinliu (JIRA) Fri, 07 Apr 2017 02:40:29 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shaolinliu updated SPARK-20248:
-------------------------------
    Description: 
  When we using thrift server, it is difficult to constrain the user's sql 
statement;
  When the user query a large table without limit, this will lead to thrift 
server process memory occupancy lead to service instability;
  In general, the user is not used correctly, because if you really need to 
return the whole table:
  1, if you use this data to compute , you can complete the computation in the 
cluster and then return
  2, if you want obtain the data, you can store it in hdfs

  For the above scene, it is recommended to add a 
"spark.sql.thriftserver.retainedResults" parameter,
  1, when it is 0, we don not restrict user's operation
  2, when it is greater than 0, if user query with limit, we use user's 
limit;if not we use this to limit query's result
  Priority user's limit is because, if the user consider the limit, in general, 
the user is aware of the exact meaning of this query

  was:
  When we using thrift server, it is difficult to constrain the user's sql 
statement;
  When the user query a large table without limit, this will lead to thrift 
server process memory occupancy lead to service instability;
  In general, the user is not used correctly, because if you really need to 
return the whole table:
  1, if you use this data to compute , you can complete the computation in the 
cluster and then return
  2, if you want obtain the data, you can store it in hdfs

  For the above scene, it is recommended to add a 
"spark.sql.thriftServer.retainedResults" parameter,
  1, when it is 0, we don not restrict user's operation
  2, when it is greater than 0, if user query with limit, we use user's 
limit;if not we use this to limit query's result
  Priority user's limit is because, if the user consider the limit, in general, 
the user is aware of the exact meaning of this query


> Spark SQL add limit parameter to enhance the reliability.
> ---------------------------------------------------------
>
>                 Key: SPARK-20248
>                 URL: https://issues.apache.org/jira/browse/SPARK-20248
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0
>         Environment: 2.1.0
>            Reporter: shaolinliu
>            Priority: Minor
>
>   When we using thrift server, it is difficult to constrain the user's sql 
> statement;
>   When the user query a large table without limit, this will lead to thrift 
> server process memory occupancy lead to service instability;
>   In general, the user is not used correctly, because if you really need to 
> return the whole table:
>   1, if you use this data to compute , you can complete the computation in 
> the cluster and then return
>   2, if you want obtain the data, you can store it in hdfs
>   For the above scene, it is recommended to add a 
> "spark.sql.thriftserver.retainedResults" parameter,
>   1, when it is 0, we don not restrict user's operation
>   2, when it is greater than 0, if user query with limit, we use user's 
> limit;if not we use this to limit query's result
>   Priority user's limit is because, if the user consider the limit, in 
> general, the user is aware of the exact meaning of this query



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-20248) Spark SQL add limit parameter to enhance the reliability.

Reply via email to