clintropolis commented on a change in pull request #12185:
URL: https://github.com/apache/druid/pull/12185#discussion_r798137344
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -258,7 +259,8 @@ public ColumnCapabilities getColumnCapabilities(String
columnName)
protected IncrementalIndex(
final IncrementalIndexSchema incrementalIndexSchema,
final boolean deserializeComplexMetrics,
- final boolean concurrentEventAdd
+ final boolean concurrentEventAdd,
+ final boolean preserveExistingMetrics
Review comment:
i feel like this definitely needs javadocs/comments to ensure that
nothing ever sets this except DruidInputSource, since that is the only case
where we should have existing metrics
##########
File path:
processing/src/test/java/org/apache/druid/segment/data/IncrementalIndexTest.java
##########
@@ -725,4 +728,171 @@ public void testDynamicSchemaRollup() throws
IndexSizeExceededException
Assert.assertEquals(2, index.size());
}
+
Review comment:
do we need sad path tests here? like is it possible to end up with a mix
of aggregators that are not the same type if the ingestion schema changed
between segment creation? what if non-rolled up has columns with the same name
as aggregators in rolled up? should these be an error? is there some external
safeguards that prevent these tasks from ever getting started due to
incompatible segment schemas to use this mode that is external to this PR?
also what about more extreme cases, like sketch aggregators, which have a
lot more variety between their building aggregators and combining aggregators?
basically my concern here is around making sure there are no scenarios where
we are silently doing the wrong thing, writing incorrect segments, instead of
loudly failing when it isn't well supported.
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -89,10 +93,11 @@
boolean concurrentEventAdd,
boolean sortFacts,
int maxRowCount,
- long maxBytesInMemory
+ long maxBytesInMemory,
+ boolean preserveExistingMetrics
Review comment:
same comment about javadoc and comments about reingestion
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -157,6 +163,16 @@ protected void initAggs(
concurrentEventAdd
)
);
+ if (isPreserveExistingMetrics()) {
Review comment:
nit: since this is happening basically per column per row,
`isPreserveExistingMetrics()` should probably be saved in a final field or just
made `protected` in `IncrementalIndex`, and use that directly which probably
has a better chance of being optimized by jvm than this being a method call
everywhere (though maybe jvm is cool enough it never usually hurts to help it
out by being explicit and cutting down method invocations). same applies to
other uses of `isPreserveExistingMetrics()`
side note: it might be worth sanity check measuring performance of ingestion
before and after this change to make sure there is little/no impact when this
mode is on
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -435,6 +490,18 @@ public boolean isNull(int rowOffset, int aggOffset)
}
}
+ private Object getCombinedMetric(AggregatorFactory[] metrics, Aggregator[]
aggs, int aggOffset)
Review comment:
this seems worthy of javadoc/comments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]