imply-cheddar commented on code in PR #13085:
URL: https://github.com/apache/druid/pull/13085#discussion_r983131194
##########
processing/src/main/java/org/apache/druid/query/Queries.java:
##########
@@ -191,11 +191,19 @@ public static <T> Query<T> withBaseDataSource(final
Query<T> query, final DataSo
{
final Query<T> retVal;
- if (query.getDataSource() instanceof QueryDataSource) {
- final Query<?> subQuery = ((QueryDataSource)
query.getDataSource()).getQuery();
+ /*
+ * Currently, this method is implemented in terms of a static walk doing a
bunch of instanceof checks.
+ * We should likely look into moving this functionality into the
DataSource object itself so that they
+ * can walk and create new objects on their own. This will be necessary
as we expand the set of DataSources
+ * that do actual work, as each of them will need to show up in this
if/then waterfall.
+ */
Review Comment:
I'm pretty sure it should be safe to move something logically equivalent to
this method into a method on `DataSource` itself. Let's do that inside of this
refactor, so that we leave this refactor with hopefully never needing to have
random methods of `instanceof` checks again.
##########
processing/src/main/java/org/apache/druid/query/JoinDataSource.java:
##########
@@ -127,12 +155,22 @@ public static JoinDataSource create(
final String rightPrefix,
final JoinConditionAnalysis conditionAnalysis,
final JoinType joinType,
- final DimFilter leftFilter
+ final DimFilter leftFilter,
+ @Nullable @JacksonInject final JoinableFactoryWrapper
joinableFactoryWrapper
Review Comment:
Having `@JacksonInject` on a method that isn't used by Jackson can be
confusing (it will make future developers try to figure out how Jackson uses
this method). I don't think it needs to be there.
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/BaseLeafFrameProcessor.java:
##########
@@ -152,13 +153,23 @@ private boolean initializeSegmentMapFn(final IntSet
readableInputs)
segmentMapFn = Function.identity();
return true;
} else {
- final boolean retVal =
broadcastJoinHelper.buildBroadcastTablesIncrementally(readableInputs);
-
- if (retVal) {
- segmentMapFn = broadcastJoinHelper.makeSegmentMapFn(query);
+ if (query.getDataSource() instanceof InputNumberDataSource) {
+ final boolean retVal =
broadcastJoinHelper.buildBroadcastTablesIncrementally(readableInputs);
+ if (retVal) {
+ InputNumberDataSource inputNumberDataSource =
(InputNumberDataSource) query.getDataSource();
+ // The InputNumberData source was going through the
broadcastJoinHelper which
+ // was using the JoinableFactoryWrapper to create segment map
function.
+ // After refactoring, the segment map function creation is moved to
data source
+ // Hence for InputNumberDataSource we are setting the broadcast join
helper for the data source
+ // and moving the segment map function creation there
Review Comment:
Sorry to pick on a comment, but this comment does what most comments in code
do: it talks from the context of the developer who is writing the code. Any
new developer who comes and reads the code for the first time doesn't even know
that this code was refactored, let alone know why it was refactored or what the
state of the code was before the refactor. Instead of explain the change of
the refactor, the comment should attempt to help explain what the current,
post-refactor code is attempting to accomplish. I.e. something like
```
// The InputNumberDataSource requires a BroadcastJoinHelper to be able to
create its
// segment map function. It would be a lot better if the
InputNumberDataSource actually
// had a way to get that injected into it on its own, but the relationship
between these objects
// was figured out during a refactor and using a setter here seemed like the
least-bad way to
// make progress on the refactor without breaking functionality. Hopefully,
some future
// developer will move this away from a setter.
```
Or something like that.
##########
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/MSQTestBase.java:
##########
@@ -565,6 +572,8 @@ private ObjectMapper setupObjectMapper(Injector injector)
"compaction"
)
).registerSubtypes(ExternalDataSource.class));
+ DruidSecondaryModule.setupJackson(injector, mapper);
+/*
Review Comment:
Delete the code instead of comment it out please.
##########
server/src/main/java/org/apache/druid/server/LocalQuerySegmentWalker.java:
##########
@@ -94,12 +93,12 @@ public <T> QueryRunner<T> getQueryRunnerForIntervals(final
Query<T> query, final
final AtomicLong cpuAccumulator = new AtomicLong(0L);
- final Function<SegmentReference, SegmentReference> segmentMapFn =
joinableFactoryWrapper.createSegmentMapFn(
- analysis.getJoinBaseTableFilter().map(Filters::toFilter).orElse(null),
- analysis.getPreJoinableClauses(),
- cpuAccumulator,
- analysis.getBaseQuery().orElse(query)
- );
+ Function<SegmentReference, SegmentReference> segmentMapFn =
analysis.getDataSource()
+
.createSegmentMapFunction(
+
query,
+
cpuAccumulator
+ );
+
Review Comment:
Now that there's a lot less arguments here, I'm pretty sure this formatting
is no longer the best way to represent this code. Please condense things.
##########
processing/src/main/java/org/apache/druid/query/InlineDataSource.java:
##########
@@ -102,6 +105,55 @@ public static InlineDataSource fromIterable(
return new InlineDataSource(rows, signature);
}
+ /**
+ * A very zealous equality checker for "rows" that respects deep equality of
arrays, but nevertheless refrains
+ * from materializing things needlessly. Useful for unit tests that want to
compare equality of different
+ * InlineDataSource instances.
+ */
+ private static boolean rowsEqual(final Iterable<Object[]> rowsA, final
Iterable<Object[]> rowsB)
Review Comment:
These private methods seem to have moved above some public methods, was it
just your IDE doing stuff or was there a specific reason? Generally speaking,
the flow is always public methods first.
##########
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java:
##########
@@ -340,12 +192,60 @@ static JoinClauseToFilterConversion convertJoinToFilter(
return new JoinClauseToFilterConversion(null, false);
}
+ public JoinableFactory getJoinableFactory()
+ {
+ return joinableFactory;
+ }
+
+ /**
+ * Compute a cache key prefix for a join data source. This includes the data
sources that participate in the RHS of a
+ * join as well as any query specific constructs associated with join data
source such as base table filter. This key prefix
+ * can be used in segment level cache or result level cache. The function
can return following wrapped in an
Review Comment:
A bunch of this code looks like it changed, but I think it just had methods
re-ordered? Was that intentional?
##########
processing/src/main/java/org/apache/druid/segment/join/JoinableFactoryWrapper.java:
##########
@@ -340,12 +192,60 @@ static JoinClauseToFilterConversion convertJoinToFilter(
return new JoinClauseToFilterConversion(null, false);
}
+ public JoinableFactory getJoinableFactory()
+ {
+ return joinableFactory;
+ }
+
+ /**
+ * Compute a cache key prefix for a join data source. This includes the data
sources that participate in the RHS of a
+ * join as well as any query specific constructs associated with join data
source such as base table filter. This key prefix
+ * can be used in segment level cache or result level cache. The function
can return following wrapped in an
+ * Optional
+ * - Non-empty byte array - If there is join datasource involved and caching
is possible. The result includes
+ * join condition expression, join type and cache key returned by joinable
factory for each {@link PreJoinableClause}
+ * - NULL - There is a join but caching is not possible. It may happen if
one of the participating datasource
+ * in the JOIN is not cacheable.
+ *
+ * @param dataSourceAnalysis for the join datasource
+ * @return the optional cache key to be used as part of query cache key
+ * @throws {@link IAE} if this operation is called on a non-join data source
+ */
+ public Optional<byte[]> computeJoinDataSourceCacheKey(
Review Comment:
The existence of this method is another point where things seem a little
suspect. I think the `DataSource` object needs a `public byte[] getCacheKey()`
method. Most of the current implementations would just return `new byte[]{}`,
but the join one should return this (or null if not cacheable). The unnest
implementation will need to override the method as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]