[GitHub] [druid] jihoonson commented on a change in pull request #10503: Additional documentation for query caching

GitBox Fri, 16 Oct 2020 14:59:57 -0700


jihoonson commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r506718820




##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,19 @@ Note that the task executor processes only support caches 
that keep their data l
 This restriction exists because the cache stores results at the level of 
intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be 
identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor 
processes.
+
+## Unsupported queries
+
+Query caching is not available for following:
+- Queries, that involve a `union` datasource, do not support result-level 
caching. Refer to the 
+[related github issue](https://github.com/apache/druid/issues/8713) for 
details. Top level union SQL queries can still 

Review comment:
       ```
       ../docs/querying/caching.md
          90 | [related github issue](https://github.com/apa 
   >> 1 spelling error found in 167 files
   ```
   
   The CI is failing because of this line. Please add a suppression in 
`website/.spelling`. BTW, I think it should be `GitHub`. 

##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches 
that keep their data l
 This restriction exists because the cache stores results at the level of 
intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be 
identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor 
processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - 
[More details](https://github.com/apache/druid/issues/8713)

Review comment:
       > I was deliberate in avoiding datasource term since SQL users don't 
define `datasource` as such. For them, its just union operator.
   
   Even they don't define datasource by themselves, their query will be 
translated into native queries, which will determine whether it will be cached 
or not. I think it will be better to be precise so that users don't get 
confused.
   
   > Though I think Top Level Union queries may still be cached since they are 
not translated into a Union datasource.
   
   Good point, I'm not sure what you mean by "Top Level Union queries" though. 
In SQL, the union operator can be translated to either 
`DruidUnionDataSourceRule` or `DruidUnionRule`. The former is converted to a 
`union` datasource while the later is executed sequentially by the sql layer. 
AFAIT, the former can be used when it's `UNION ALL` of flat scan subqueries. 
The later can be used otherwise (still only for `UNION ALL`). So, the 
result-level cache cannot be used for the former, but can for the later. Maybe 
it could say, "Queries, that have a `union` datasource, do not support 
result-level caching. For SQL, a union SQL query can be translated to a native 
query with a `union` datasource when it is a `UNION ALL` of flat scan 
subqueries. These queries cannot be cached at the result-level."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a change in pull request #10503: Additional documentation for query caching

Reply via email to