clintropolis commented on a change in pull request #12185:
URL: https://github.com/apache/druid/pull/12185#discussion_r798137344



##########
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -258,7 +259,8 @@ public ColumnCapabilities getColumnCapabilities(String 
columnName)
   protected IncrementalIndex(
       final IncrementalIndexSchema incrementalIndexSchema,
       final boolean deserializeComplexMetrics,
-      final boolean concurrentEventAdd
+      final boolean concurrentEventAdd,
+      final boolean preserveExistingMetrics

Review comment:
       i feel like this definitely needs javadocs/comments to ensure that 
nothing ever sets this except DruidInputSource, since that is the only case 
where we should have existing metrics

##########
File path: 
processing/src/test/java/org/apache/druid/segment/data/IncrementalIndexTest.java
##########
@@ -725,4 +728,171 @@ public void testDynamicSchemaRollup() throws 
IndexSizeExceededException
 
     Assert.assertEquals(2, index.size());
   }
+

Review comment:
       do we need sad path tests here? like is it possible to end up with a mix 
of aggregators that are not the same type if the ingestion schema changed 
between segment creation? what if non-rolled up has columns with the same name 
as aggregators in rolled up? should these be an error? is there some external 
safeguards that prevent these tasks from ever getting started due to 
incompatible segment schemas to use this mode that is external to this PR?
   
   also what about more extreme cases, like sketch aggregators, which have a 
lot more variety between their building aggregators and combining aggregators?
   
   basically my concern here is around making sure there are no scenarios where 
we are silently doing the wrong thing, writing incorrect segments, instead of 
loudly failing when it isn't well supported.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -89,10 +93,11 @@
       boolean concurrentEventAdd,
       boolean sortFacts,
       int maxRowCount,
-      long maxBytesInMemory
+      long maxBytesInMemory,
+      boolean preserveExistingMetrics

Review comment:
       same comment about javadoc and comments about reingestion

##########
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -157,6 +163,16 @@ protected void initAggs(
               concurrentEventAdd
           )
       );
+      if (isPreserveExistingMetrics()) {

Review comment:
       nit: since this is happening basically per column per row, 
`isPreserveExistingMetrics()` should probably be saved in a final field or just 
made `protected` in `IncrementalIndex`, and use that directly which probably 
has a better chance of being optimized by jvm than this being a method call 
everywhere (though maybe jvm is cool enough it never usually hurts to help it 
out by being explicit and cutting down method invocations). same applies to 
other uses of `isPreserveExistingMetrics()`
   
   side note: it might be worth sanity check measuring performance of ingestion 
before and after this change to make sure there is little/no impact when this 
mode is on

##########
File path: 
processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -435,6 +490,18 @@ public boolean isNull(int rowOffset, int aggOffset)
     }
   }
 
+  private Object getCombinedMetric(AggregatorFactory[] metrics, Aggregator[] 
aggs, int aggOffset)

Review comment:
       this seems worthy of javadoc/comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to