kgyrtkirk commented on code in PR #16517:
URL: https://github.com/apache/druid/pull/16517#discussion_r1627164179
##########
processing/src/main/java/org/apache/druid/segment/DictionaryEncodedColumnMerger.java:
##########
@@ -322,7 +322,7 @@ public void processMergedRow(ColumnValueSelector selector)
throws IOException
if (encodedValueSerializer instanceof ColumnarMultiIntsSerializer) {
((ColumnarMultiIntsSerializer) encodedValueSerializer).addValues(row);
} else {
- int value = row.size() == 0 ? 0 : row.get(0);
+ int value = rowSize == 0 ? 0 : row.get(0);
Review Comment:
oh - that's true; and its being used as well around here!
but how it works now is that it will call `getRow` for every `size` and
`get(i)` invocations as well [see
here](https://github.com/apache/druid/blob/30c59042e0cde3c76392afa3ba73d0830b11b0a5/processing/src/main/java/org/apache/druid/segment/DictionaryEncodedColumnIndexer.java#L161)
the costly process is to produce the `row` itself; currently built by:
[StringDimensionIndexer$IndexerDimensionSelector#getRow](https://github.com/apache/druid/blob/30c59042e0cde3c76392afa3ba73d0830b11b0a5/processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java#L304)
- if that could be cached somehow - that would result in the same benefits.
There are better access to internals there; I believe if the "same" entry is
being used (`currEntry.get()`) then it might be ok to return the previously
built row.
Although the
[IncrementalIndexRow](https://github.com/apache/druid/blob/30c59042e0cde3c76392afa3ba73d0830b11b0a5/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndexRow.java#L110)
looks partially mutable; but the since the `timestamp` can't be set: I think
its not intended to be reused - so I think you could put a caching logic there
based on whether `currEntry.get()` is the same
to ensure that I'll don't send you on a wild goose chase I'll ask
@clintropolis to possibly doublecheck the above thinking :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]