siddharthteotia commented on a change in pull request #4790: Support ORDER BY
for DISTINCT queries
URL: https://github.com/apache/incubator-pinot/pull/4790#discussion_r342690096
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/DistinctTable.java
##########
@@ -39,85 +42,44 @@
* (2) The same object is serialized by the server inside the data table
* for sending the results to broker. Broker deserializes it.
*/
-public class DistinctTable {
+public class DistinctTable extends SimpleIndexedTable {
private static final double LOAD_FACTOR = 0.75;
private static final int MAX_INITIAL_CAPACITY = 64 * 1024;
- private FieldSpec.DataType[] _columnTypes;
- private String[] _columnNames;
- private Set<Key> _table;
- /**
- * Add a row to hash table
- * @param key multi-column key to add
- */
- public void addKey(final Key key) {
- _table.add(key);
- }
-
- public DistinctTable(int limit) {
+ public DistinctTable(DataSchema dataSchema, List<SelectionSort> orderBy, int
limit) {
Review comment:
> Why were we not able to use SimpleIndexTable `as-is`? If there are
issues/limitations in that, then let's call them out, and discuss. Ideally, we
want to make the base implementation as generic as possible, so we don't need
to extend for specific cases. The fact that you needed to do so implies that
the abstraction for IndexTable is generic enough to capture all use cases.
@mayankshriv , I initially implemented the solution by directly leveraging
the SimpleIndexedTable -- which means getting rid of DistinctTable interface.
There were two important impacts of this approach.
(1) Distinct no longer has a consistent behavior with other aggregation
functions -- in other words, DistinctAggregationFunction no longer returns its
result as a single serializable object since the IntermediateResult now becomes
SimpleIndexedTable
(2) This is most important -- Once the CombineOperator merges the
intermediate result from Indexed tables of all segments, it now has to send
back the merged IntermediateResultBlock. We already have a constructor for
returning IndexedTable as part of IntermediateResultBlock. However, the merged
indexed table is sitting inside the function and currently AggregationFunction
does not have any interface for returning the table.
Therefore, code inside CombineOperator became a bit ugly with special casing
for Distinct and it also led to introducing a bogus API getIndexedTable() on
AggregationFunction interface. While (1) is not a big reason for not going down
this route, (2) made the API and code a mess
The approach implemented in this PR doesn't have these concerns. Code in
CombineOperator is untouched.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]