clintropolis commented on a change in pull request #9731:
URL: https://github.com/apache/druid/pull/9731#discussion_r430688872
##########
File path:
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilities.java
##########
@@ -57,6 +50,21 @@ public boolean isTrue()
return this == TRUE;
}
+ public boolean isMaybeTrue()
+ {
+ return isTrue() || isUnknown();
+ }
+
+ public boolean isUnknown()
+ {
+ return this == UNKNOWN;
+ }
+
+ public Capable complete(boolean convertUnknownToTrue)
Review comment:
I don't think it needs another enum, if anything it could just accept a
`Capable` to use for the default value, though I should enforce that it cannot
be unknown. If i rename to `coerceIfUnknown` or something maybe it would also
be more obvious what the purpose is.
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndexStorageAdapter.java
##########
@@ -142,24 +141,8 @@ public Comparable getMaxValue(String column)
@Override
public ColumnCapabilities getColumnCapabilities(String column)
{
- // Different from index.getCapabilities because, in a way,
IncrementalIndex's string-typed dimensions
- // are always potentially multi-valued at query time. (Missing / null
values for a row can potentially be
- // represented by an empty array; see
StringDimensionIndexer.IndexerDimensionSelector's getRow method.)
- //
- // We don't want to represent this as having-multiple-values in
index.getCapabilities, because that's used
- // at index-persisting time to determine if we need a multi-value column
or not. However, that means we
- // need to tweak the capabilities here in the StorageAdapter (a query-time
construct), so at query time
- // they appear multi-valued.
-
- final ColumnCapabilities capabilitiesFromIndex =
index.getCapabilities(column);
- final IncrementalIndex.DimensionDesc dimensionDesc =
index.getDimension(column);
- if (dimensionDesc != null && dimensionDesc.getCapabilities().getType() ==
ValueType.STRING) {
- final ColumnCapabilitiesImpl retVal =
ColumnCapabilitiesImpl.copyOf(capabilitiesFromIndex);
- retVal.setHasMultipleValues(true);
- return retVal;
- } else {
- return capabilitiesFromIndex;
- }
+ // snapshot the current state
+ return ColumnCapabilitiesImpl.complete(index.getCapabilities(column),
false);
Review comment:
Ok, given the race condition I guess I have to revert this behavior so
queries use the fake capabilities and to fix the bug for segment metadata based
schema discovery to plumb special the actual capabilities to segment metadata
queries. Maybe longer term it would be better if we could snapshot at the time
we query and make a cursor to only read the rows that were available at the
time the capabilities were fetched so that we can have more performant queries
on incremental indexes at the cost of missing a few rows.
##########
File path:
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndexStorageAdapter.java
##########
@@ -142,24 +141,8 @@ public Comparable getMaxValue(String column)
@Override
public ColumnCapabilities getColumnCapabilities(String column)
{
- // Different from index.getCapabilities because, in a way,
IncrementalIndex's string-typed dimensions
- // are always potentially multi-valued at query time. (Missing / null
values for a row can potentially be
- // represented by an empty array; see
StringDimensionIndexer.IndexerDimensionSelector's getRow method.)
- //
- // We don't want to represent this as having-multiple-values in
index.getCapabilities, because that's used
- // at index-persisting time to determine if we need a multi-value column
or not. However, that means we
- // need to tweak the capabilities here in the StorageAdapter (a query-time
construct), so at query time
- // they appear multi-valued.
-
- final ColumnCapabilities capabilitiesFromIndex =
index.getCapabilities(column);
- final IncrementalIndex.DimensionDesc dimensionDesc =
index.getDimension(column);
- if (dimensionDesc != null && dimensionDesc.getCapabilities().getType() ==
ValueType.STRING) {
- final ColumnCapabilitiesImpl retVal =
ColumnCapabilitiesImpl.copyOf(capabilitiesFromIndex);
- retVal.setHasMultipleValues(true);
- return retVal;
- } else {
- return capabilitiesFromIndex;
- }
+ // snapshot the current state
+ return ColumnCapabilitiesImpl.complete(index.getCapabilities(column),
false);
Review comment:
Ok, given the race condition I guess I have to revert this behavior so
queries use the fake capabilities and to fix the bug for segment metadata based
schema discovery to plumb special the actual capabilities to segment metadata
queries. Maybe longer term it would be better if we could snapshot the
capabilities at the time we query and make a cursor to only read the rows that
were available at the time the capabilities were fetched and give those to the
string indexer so that we can have more performant queries on incremental
indexes at the cost of missing a few rows.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]