leventov commented on a change in pull request #7293: AggregatorFactory:
Clarify methods that return other AggregatorFactories.
URL: https://github.com/apache/incubator-druid/pull/7293#discussion_r267131588
##########
File path:
processing/src/main/java/org/apache/druid/query/aggregation/AggregatorFactory.java
##########
@@ -94,20 +94,44 @@ public AggregateCombiner makeNullableAggregateCombiner()
}
/**
- * Returns an AggregatorFactory that can be used to combine the output of
aggregators from this factory. This
- * generally amounts to simply creating a new factory that is the same as
the current except with its input
- * column renamed to the same as the output column.
+ * Returns an AggregatorFactory that can be used to combine the output of
aggregators from this factory. It is used
+ * when we know we have some values that were produced with this aggregator
factory, and want to do some additional
+ * combining of them. This happens, for example, when merging query results
from two different segments, or two
+ * different servers.
+ *
+ * The combining factory is often computed by simply creating a new factory
that is the same as the current, except
+ * with its input column renamed to the same as the output column. For
example, this aggregator:
+ *
+ * {@code {"type": "longSum", "fieldName": "foo", "name": "bar"}}
+ *
+ * Would become:
+ *
+ * {@code {"type": "longSum", "fieldName": "bar", "name": "bar"}}
+ *
+ * Somtimes, the type or other parameters of the aggregator will change. For
example, a "count" aggregator's
+ * getCombiningFactory method will return a "longSum" aggregator, because
counts are combined by summing.
+ *
+ * No matter what, `foo.getCombiningFactory()` and
`foo.getCombiningFactory().getCombiningFactory()` should return
+ * the same result.
*
* @return a new Factory that can be used for operations on top of data
output from the current factory.
*/
public abstract AggregatorFactory getCombiningFactory();
/**
- * Returns an AggregatorFactory that can be used to merge the output of
aggregators from this factory and
- * other factory.
- * This method is relevant only for AggregatorFactory which can be used at
ingestion time.
+ * Returns an AggregatorFactory that can be used to combine the output of
aggregators from this factory and
+ * another factory. It is used when we have some values produced by this
aggregator factory, and some values produced
+ * by the "other" aggregator factory, and we want to do some additional
combining of them. This happens, for example,
+ * when compacting two segments together that both have a metric column with
the same name. (Even though the name of
+ * the column is the same, the aggregator factory used to create it may be
different from segment to segment.)
+ *
+ * This method may throw {@link AggregatorFactoryNotMergeableException},
meaning that "this" and "other" are not
+ * compatible and values from one cannot sensibly be combined with values
from the other.
*
* @return a new Factory that can be used for merging the output of
aggregators from this factory and other.
+ *
+ * @see #getCombiningFactory(), which is equivalent to {@code
foo.getMergingFactory(foo)} (when "this" and "other"
+ * are the same instance).
*/
public AggregatorFactory getMergingFactory(AggregatorFactory other) throws
AggregatorFactoryNotMergeableException
Review comment:
Maybe there is a better name for this method? "mergingFactoryWith"? So that
it doesn't imply that the merging factory "belongs" to the receiver factory (on
which `getMergingFactory()` is called), but has more symmetric nature.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]