clintropolis commented on a change in pull request #7116: segment metadata
fallback analysis if no bitmaps
URL: https://github.com/apache/incubator-druid/pull/7116#discussion_r258749692
##########
File path:
processing/src/main/java/org/apache/druid/query/metadata/SegmentAnalyzer.java
##########
@@ -198,26 +199,42 @@ private ColumnAnalysis analyzeStringColumn(
Comparable min = null;
Comparable max = null;
-
- if (!capabilities.hasBitmapIndexes()) {
- return ColumnAnalysis.error("string_no_bitmap");
- }
-
- final BitmapIndex bitmapIndex = columnHolder.getBitmapIndex();
- final int cardinality = bitmapIndex.getCardinality();
-
- if (analyzingSize()) {
- for (int i = 0; i < cardinality; ++i) {
- String value = bitmapIndex.getValue(i);
- if (value != null) {
- size += StringUtils.estimatedBinaryLengthAsUTF8(value) *
bitmapIndex.getBitmap(bitmapIndex.getIndex(value)).size();
+ int cardinality = 0;
+ if (capabilities.hasBitmapIndexes()) {
+ final BitmapIndex bitmapIndex = columnHolder.getBitmapIndex();
+ cardinality = bitmapIndex.getCardinality();
+
+ if (analyzingSize()) {
+ for (int i = 0; i < cardinality; ++i) {
+ String value = bitmapIndex.getValue(i);
+ if (value != null) {
+ size += StringUtils.estimatedBinaryLengthAsUTF8(value) *
bitmapIndex.getBitmap(bitmapIndex.getIndex(value))
+
.size();
+ }
}
}
- }
- if (analyzingMinMax() && cardinality > 0) {
- min = NullHandling.nullToEmptyIfNeeded(bitmapIndex.getValue(0));
- max = NullHandling.nullToEmptyIfNeeded(bitmapIndex.getValue(cardinality
- 1));
+ if (analyzingMinMax() && cardinality > 0) {
+ min = NullHandling.nullToEmptyIfNeeded(bitmapIndex.getValue(0));
+ max =
NullHandling.nullToEmptyIfNeeded(bitmapIndex.getValue(cardinality - 1));
+ }
+ } else if (capabilities.isDictionaryEncoded()) {
+ // fallback if no bitmap index
Review comment:
Agree. I think it would be interesting to have size be representative of
"estimated size in memory" for incremental index, or size in bytes of column in
the segment file that is mapped, but what it's computing here seems like
basically nonsense for all column types.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]