FrankChen021 opened a new issue, #17769:
URL: https://github.com/apache/druid/issues/17769

   ## Motivation
   
   Query context is part of query request. Under current implementation, query 
context and SQL are seperated. It makes sense for native query, where query and 
query context are kept in separated fields. 
   However, for SQL, such design imposes complexity of SQL request -- we have 
to write SQL in JSON way. 
   ```json
   {
     "query":"SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE \"__time\" 
BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' 
GROUP BY page ORDER BY Edits DESC LIMIT 10",
     "context":  {
       "enableParallelMerge": false
     }
   }
   ```
   
   This is NOT the straightforward way to use SQL.
   
   Under the web-cosole, the console helps us encapsulate the query into the 
JSON, however, there're still problems:
   1. After writting SQL, we have to use 'Edit Context' feature on web-console 
to customize settings. 
   2. query context is query level instead of client side application level. 
web-console remembers the edited query context for furture queries, which might 
not be what we expect. Sometimes we forget to reset the query context, or we 
have to delete the query context manually. 
   
   Another probloem is that, there's no validation of the query context items. 
   We can put ANYTHING in the query context, if there's typo of query context 
attribute name, Druid DOES NOT tell us about it.
   
   Last but not the least, we DON'T know which query context properties are 
supported by Druid, we have to read through different documents to know what 
query context properties are supported, such as:
   - https://druid.apache.org/docs/latest/querying/searchquery#query-context
   - https://druid.apache.org/docs/latest/querying/query-context
   - 
https://druid.apache.org/docs/latest/querying/groupbyquery#advanced-configurations
   
   ## Proposal
   
   Let's solve these problem together. 
   
   ### Firstly, let's introduce a `SETTINGS` subclause in the SQL statement. 
   This subclause accepts a list of key-value pair, where each key is the 
support query context property while the value is the corresponding value of 
that property. For example:
   
   ```sql
   SELECT * FROM wikipedia
   SETTINGS enableParallelMerge = false, sqlOuterLimit = 10
   ```
   
   Since query context now is part of SQL, it's naturally for users to 
add/append query context properties per query as they want.
   
   Some other databases solves this problem in different ways.
   
   - For OLTP database like MySQL, it provides `SET` statement to allow users 
to change session level variables. Since Druid has no 'session' concept because 
queries are executed on HTTP connection, such alternative is NOT applicable for 
Druid
   - Some databases like StarRocks, allows users customize variables in SQL 
hint, like:
   ```sql
   SELECT /*+ SET_VAR(query_mem_limit = 8589934592) */ name FROM people ORDER 
BY name;
   ```
   This does not require changes of SQL parser, but the biggest disadvantage is 
it's not user friendly.
   - SQL Server provides a [`OPTION` 
subclause](https://learn.microsoft.com/en-us/sql/t-sql/queries/option-clause-transact-sql)
 as query hint, which is similar to the proposal
   ```sql
   SELECT * FROM FactResellerSales
   OPTION (LABEL = 'q17');
   ```
   
   The proposed change is not easy in Druid as it requires us to customize 
Calcite by editing the file `sql/src/main/codegen/config.fmpp`
   What the parser does is converting the settings clause into a `QueryContext` 
object internally.
   
   ### Secondly, let's improve the `/druid/v2/sql` endpoint by allowing Druid 
accept raw text SQL instead of only JSON format.
   
   If the Content-Type is given as `application/json`, which is current 
behaviour, Druid treats the input as JSON, or it treats the entire input as raw 
SQL text.
   
   Under this mode, we can send SQLs to Druid in much simpler way:
   
   ```text
   curl -X 'POST' -d 'SELECT * FROM wikipedia SETTINGS enableParallelMerge = 
false, sqlOuterLimit = 10'  http://localhost:8888/druid/v2/sql 
   ```
   
   ### Thirdly, inside the Druid, let's define a `sys.settings` system table to 
hold all query context properties.
   
   We should put all query context properties together and register them into 
this table so that query context properties can be managed in a single place.
   
   The schema of this should be sth as follows:
   
   |  Column Name | Type | Description |
   |----------------|------|------------|
   | name | String | query context property name |
   | type | String | type of this property |
   | default_value | String | The default value of this property is it's not 
given in user's query |
   | description | String | The description of this property |
   
   With this table:
   - it's very easy for users to know how many properties/what kind of 
properties are supported in the query context. No need to check documents as 
the default document pages matches the latest the version which might be 
different from the version users are using. Querying from sys.settings table 
always tell them which properties are supported
   - web-console can also use this system table for better code completion and 
user experience
   
   ### Forthly, Druid MUST verify if query context properties given by user 
queries are valid
   
   When a query comes into Druid, it should verifies if given query context 
properties are pre-defined and valid. It MUST reject any queries with bad query 
context settings.
   
   
   The above changes 1,2,3 are independent(so they can be done separately) 
while the validation of query context attributes might share the same internal 
data structure of `sys.settings` table.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to