Re: [PR] fix bug join query with require time condition (druid)

via GitHub Tue, 21 Nov 2023 21:26:20 -0800


LakshSingla commented on code in PR #14890:
URL: https://github.com/apache/druid/pull/14890#discussion_r1401520569



##########
sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java:
##########
@@ -155,12 +163,47 @@ public QueryResponse<Object[]> runQuery(final DruidQuery 
druidQuery)
     );
   }
 
-  private List<Interval> findBaseDataSourceIntervals(Query<?> query)
+  private List<Interval> findMaxBaseDataSourceIntervals(DataSource dataSource, 
DataSourceAnalysis dsAnalysis, List<Interval> defaults)
   {
-    return query.getDataSource().getAnalysis()
-                .getBaseQuerySegmentSpec()
-                .map(QuerySegmentSpec::getIntervals)
-                .orElseGet(query::getIntervals);
+    Optional<List<Interval>> intervals = 
dsAnalysis.getBaseQuerySegmentSpec().map(QuerySegmentSpec::getIntervals);
+    if (intervals.isPresent()) {
+      return intervals.get();
+    }
+
+    // If dataSource is a instance of JoinDataSource, find the base intervals 
of left and right data sources
+    if (dataSource instanceof JoinDataSource) {
+      JoinDataSource joinDataSource = (JoinDataSource) dataSource;
+      DataSource right = joinDataSource.getRight();
+      DataSource left = joinDataSource.getLeft();
+
+      // right data source has a subquery
+      if (right instanceof QueryDataSource) {
+        DataSource rightInner = ((QueryDataSource) 
right).getQuery().getDataSource();
+        DataSourceAnalysis rightAnlaysisInner = rightInner.getAnalysis();
+        if (rightInner instanceof JoinDataSource) {
+          return findMaxBaseDataSourceIntervals(rightInner, 
rightAnlaysisInner, defaults);

Review Comment:
   Also, what's the purpose of the `requireTimeCondition`? If it is to prevent 
the query from hitting all of the segments on the historicals, then it probably 
doesn't do a very good job of it, and perhaps can must be changed. The problems 
I see with it:
   1. Doesn't take into account the data source type. IMO should only be 
affecting the table data source since other data sources don't involve segments.
   2. Even if the SQL query has a time filtration, it isn't guaranteed that the 
data source has the time filtration as well. Ideally, it should, but there was 
a bug in UNNEST (earlier) that didn't push down the filtration to the data 
source, and the query would end up reading the whole data source. Perhaps other 
nuances in translation might also cause the underlying data source to not have 
a time filtration. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix bug join query with require time condition (druid)

Reply via email to