ccaominh commented on a change in pull request #9235: Add join-related 
DataSource types, and analysis functionality.
URL: https://github.com/apache/druid/pull/9235#discussion_r369317286
 
 

 ##########
 File path: 
processing/src/main/java/org/apache/druid/query/planning/DataSourceAnalysis.java
 ##########
 @@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.planning;
+
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.DataSource;
+import org.apache.druid.query.JoinDataSource;
+import org.apache.druid.query.Query;
+import org.apache.druid.query.QueryDataSource;
+import org.apache.druid.query.TableDataSource;
+import org.apache.druid.query.UnionDataSource;
+import org.apache.druid.query.spec.QuerySegmentSpec;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+
+/**
+ * Analysis of a datasource for purposes of deciding how to execute a 
particular query.
+ *
+ * The analysis breaks a datasource down in the following way:
+ *
+ * <pre>
+ *
+ *                             Q  <-- Possible outer query datasource(s) [may 
be multiple stacked]
+ *                             |
+ *                             J  <-- Possible join tree, expected to be 
left-leaning
+ *                            / \
+ *                           J  Dj <--  Other leaf datasources
+ *   Base datasource        / \         which will be joined
+ *  (bottom-leftmost) -->  Db Dj  <---- into the base datasource
+ *
+ * </pre>
+ *
+ * The base datasource (Db) is returned by {@link #getBaseDataSource()}. The 
other leaf datasources are returned by
+ * {@link #getPreJoinableClauses()}. The outer query datasources are available 
as part of {@link #getDataSource()},
+ * which just returns the original datasource that was provided for analysis.
+ *
+ * The base datasource (Db) will never be a join, but it can be any other type 
of datasource (table, query, etc).
+ * Note that join trees are only flattened if they occur at the top of the 
overall tree (or underneath an outer query),
+ * and that join trees are only flattened to the degree that they are 
left-leaning. Due to these facts, it is possible
+ * for the base or leaf datasources to include additional joins.
+ *
+ * The base datasource is the one that will be considered by the core Druid 
query stack for scanning via
+ * {@link org.apache.druid.segment.Segment} and {@link 
org.apache.druid.segment.StorageAdapter}. The other leaf
+ * datasources must be joinable onto the base data.
+ *
+ * The idea here is to keep things simple and dumb. So we focus only on 
identifying left-leaning join trees, which map
+ * neatly onto a series of hash table lookups at query time. The user/system 
generating the queries, e.g. the druid-sql
+ * layer (or the end user in the case of native queries), is responsible for 
containing the smarts to structure the
+ * tree in a way that will lead to optimal execution.
+ */
 
 Review comment:
   Nice javadoc!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to