gianm commented on a change in pull request #9731:
URL: https://github.com/apache/druid/pull/9731#discussion_r420318956
##########
File path:
processing/src/main/java/org/apache/druid/segment/DimensionIndexer.java
##########
@@ -127,6 +127,19 @@
*/
EncodedKeyComponentType
processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean
reportParseExceptions);
+ /**
+ * This method will be called whenever a known dimension is absent in a row
to allow an indexer
Review comment:
1. This javadoc doesn't make a ton of sense to me. Specifically it's not
clear to me what "known dimension is absent in a row" means or what "account
for any missing rows" might refer to. I wonder if you could find an improved
wording.
2. IMO it'd be better to not have this method be `default`. It makes it too
easy to forget to override it when necessary.
##########
File path:
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilities.java
##########
@@ -57,6 +50,21 @@ public boolean isTrue()
return this == TRUE;
}
+ public boolean isMaybeTrue()
+ {
+ return isTrue() || isUnknown();
+ }
+
+ public boolean isUnknown()
+ {
+ return this == UNKNOWN;
+ }
+
+ public Capable complete(boolean convertUnknownToTrue)
Review comment:
The boolean here is tough for me to wrap my head around: when looking at
method calls like `.complete(true)` and `.complete(false)`, it looks like it's
describing whether or not things are complete. I think it would be better to
use a two-valued enum, so the call would be `.makeComplete(UNKNOWN_TO_FALSE)`
or `.makeComplete(UNKNOWN_TO_TRUE)`.
##########
File path:
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java
##########
@@ -31,23 +31,66 @@
*/
public class ColumnCapabilitiesImpl implements ColumnCapabilities
{
- public static ColumnCapabilitiesImpl copyOf(final ColumnCapabilities other)
+ public static ColumnCapabilitiesImpl copyOf(@Nullable final
ColumnCapabilities other)
{
final ColumnCapabilitiesImpl capabilities = new ColumnCapabilitiesImpl();
- capabilities.merge(other);
- capabilities.setFilterable(other.isFilterable());
- capabilities.setIsComplete(other.isComplete());
+ if (other != null) {
+ capabilities.type = other.getType();
+ capabilities.dictionaryEncoded = other.isDictionaryEncoded();
+ capabilities.runLengthEncoded = other.isRunLengthEncoded();
+ capabilities.hasInvertedIndexes = other.hasBitmapIndexes();
+ capabilities.hasSpatialIndexes = other.hasSpatialIndexes();
+ capabilities.hasMultipleValues = other.hasMultipleValues();
+ capabilities.dictionaryValuesSorted = other.areDictionaryValuesSorted();
+ capabilities.dictionaryValuesUnique = other.areDictionaryValuesUnique();
+ capabilities.filterable = other.isFilterable();
+ }
return capabilities;
}
+ /**
+ * Used at indexing time to finalize all {@link
ColumnCapabilities.Capable#UNKNOWN} values to
+ * {@link ColumnCapabilities.Capable#FALSE}, in order to present a snapshot
of the state of the this column
+ */
+ @Nullable
+ public static ColumnCapabilitiesImpl complete(
+ @Nullable final ColumnCapabilities capabilities,
+ boolean convertUnknownToTrue
+ )
+ {
+ if (capabilities == null) {
+ return null;
+ }
+ ColumnCapabilitiesImpl copy = copyOf(capabilities);
+ copy.hasMultipleValues =
copy.hasMultipleValues.complete(convertUnknownToTrue);
+ copy.dictionaryValuesSorted =
copy.dictionaryValuesSorted.complete(convertUnknownToTrue);
+ copy.dictionaryValuesUnique =
copy.dictionaryValuesUnique.complete(convertUnknownToTrue);
+ return copy;
+ }
+
+
+ /**
+ * Create a no frills, simple column with {@link ValueType} set and
everything else false
+ */
+ public static ColumnCapabilitiesImpl createSimpleNumericColumn(ValueType
valueType)
Review comment:
`createSimpleNumericCapabilities`?
##########
File path:
processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -197,6 +197,6 @@
*/
private static boolean mayBeMultiValue(@Nullable final ColumnCapabilities
capabilities)
{
- return capabilities == null || !capabilities.isComplete() ||
capabilities.hasMultipleValues();
+ return capabilities == null ||
capabilities.hasMultipleValues().isMaybeTrue();
Review comment:
Nice.
##########
File path:
processing/src/main/java/org/apache/druid/segment/ColumnSelectorBitmapIndexSelector.java
##########
@@ -160,11 +160,11 @@ public void close() throws IOException
public boolean hasMultipleValues(final String dimension)
Review comment:
IMO, better if this returns a `Capable`. Then the caller can decide what
to do with unknowns.
##########
File path:
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -60,7 +60,8 @@
.setDictionaryEncoded(false)
.setDictionaryValuesUnique(false)
.setDictionaryValuesSorted(false)
- .setHasBitmapIndexes(false);
+ .setHasBitmapIndexes(false)
+ .setHasMultipleValues(false);
Review comment:
Why this change?
As I understand it, the previous behavior is roughly equivalent to returning
unknown for HMV. (Because it would return false, but complete is also false.)
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndexStorageAdapter.java
##########
@@ -142,24 +141,8 @@ public Comparable getMaxValue(String column)
@Override
public ColumnCapabilities getColumnCapabilities(String column)
{
- // Different from index.getCapabilities because, in a way,
IncrementalIndex's string-typed dimensions
- // are always potentially multi-valued at query time. (Missing / null
values for a row can potentially be
- // represented by an empty array; see
StringDimensionIndexer.IndexerDimensionSelector's getRow method.)
- //
- // We don't want to represent this as having-multiple-values in
index.getCapabilities, because that's used
- // at index-persisting time to determine if we need a multi-value column
or not. However, that means we
- // need to tweak the capabilities here in the StorageAdapter (a query-time
construct), so at query time
- // they appear multi-valued.
-
- final ColumnCapabilities capabilitiesFromIndex =
index.getCapabilities(column);
- final IncrementalIndex.DimensionDesc dimensionDesc =
index.getDimension(column);
- if (dimensionDesc != null && dimensionDesc.getCapabilities().getType() ==
ValueType.STRING) {
- final ColumnCapabilitiesImpl retVal =
ColumnCapabilitiesImpl.copyOf(capabilitiesFromIndex);
- retVal.setHasMultipleValues(true);
- return retVal;
- } else {
- return capabilitiesFromIndex;
- }
+ // snapshot the current state
+ return ColumnCapabilitiesImpl.complete(index.getCapabilities(column),
false);
Review comment:
Why `false` here instead of `true`?
My understanding is that the HMV capability in the IncrementalIndex is now
either going to be UNKNOWN (where it starts) or TRUE (if there are actually
MVs). So UNKNOWN means we haven't seen any MV inputs yet.
But for the reason mentioned in the comment you deleted (the behavior of
`StringDimensionIndexer.IndexerDimensionSelector's getRow method`), can't we
get empty arrays from dimension selectors when querying IncrementalIndexes,
even if the capability is UNKNOWN? So shouldn't we treat an unknown here as
`true` rather than `false`?
I must be missing something, since the tests are passing, assuming we have
tests for this…
##########
File path:
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java
##########
@@ -31,23 +31,66 @@
*/
public class ColumnCapabilitiesImpl implements ColumnCapabilities
{
- public static ColumnCapabilitiesImpl copyOf(final ColumnCapabilities other)
+ public static ColumnCapabilitiesImpl copyOf(@Nullable final
ColumnCapabilities other)
{
final ColumnCapabilitiesImpl capabilities = new ColumnCapabilitiesImpl();
- capabilities.merge(other);
- capabilities.setFilterable(other.isFilterable());
- capabilities.setIsComplete(other.isComplete());
+ if (other != null) {
+ capabilities.type = other.getType();
+ capabilities.dictionaryEncoded = other.isDictionaryEncoded();
+ capabilities.runLengthEncoded = other.isRunLengthEncoded();
+ capabilities.hasInvertedIndexes = other.hasBitmapIndexes();
+ capabilities.hasSpatialIndexes = other.hasSpatialIndexes();
+ capabilities.hasMultipleValues = other.hasMultipleValues();
+ capabilities.dictionaryValuesSorted = other.areDictionaryValuesSorted();
+ capabilities.dictionaryValuesUnique = other.areDictionaryValuesUnique();
+ capabilities.filterable = other.isFilterable();
+ }
return capabilities;
}
+ /**
+ * Used at indexing time to finalize all {@link
ColumnCapabilities.Capable#UNKNOWN} values to
Review comment:
Is this ever called with `convertUnknownToTrue == true`? If not, it
could be named better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]