jon-wei opened a new pull request #6340: Fix dictionary ID race condition in 
IncrementalIndexStorageAdapter
URL: https://github.com/apache/incubator-druid/pull/6340
 
 
   Possibly fixes https://github.com/apache/incubator-druid/issues/4937
   
   --------
   
   There is currently a race condition in IncrementalIndexStorageAdapter that 
can lead to exceptions like the following, when running queries with filters on 
String dimensions that hit realtime tasks: 
   
   ```
   org.apache.druid.java.util.common.ISE: id[5] >= maxId[5]
        at 
org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591)
        at 
org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562)
        at 
org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284)
   ```
   
   When the `filterMatcher` is created in the constructor of 
`IncrementalIndexStorageAdapter.IncrementalIndexCursor`, 
`StringDimensionIndexer.makeDimensionSelector` gets called eventually, which 
calls:
   
   ```
   final int maxId = getCardinality();
   ...
   
    @Override
     public int getCardinality()
     {
       return dimLookup.size();
     }
   ```
   
   So `maxId` is set to the size of the dictionary at the time that the 
`filterMatcher` is created.
   
   However, the `maxRowIndex` which is meant to prevent the Cursor from 
returning rows that were added after the Cursor was created (see 
https://github.com/apache/incubator-druid/pull/4049) is set after the 
`filterMatcher` is created.
   
   If rows with new dictionary values are added after the `filterMatcher` is 
created but before `maxRowIndex` is set, then it is possible for the Cursor to 
return rows that contain the new values, which will have `id >= maxId`.
   
   This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent 
rows with unknown dictionary IDs from being passed to the `filterMatcher`.
   
   -----------
   
   The included test triggers the error with a custom Filter + 
DruidPredicateFactory.
   
   The DimensionSelector for predicate-based filter matching is created here in 
`Filters.makeValueMatcher`:
   
   ```
     public static ValueMatcher makeValueMatcher(
         final ColumnSelectorFactory columnSelectorFactory,
         final String columnName,
         final DruidPredicateFactory predicateFactory
     )
     {
       final ColumnCapabilities capabilities = 
columnSelectorFactory.getColumnCapabilities(columnName);
   
       // This should be folded into the ValueMatcherColumnSelectorStrategy 
once that can handle LONG typed columns.
       if (capabilities != null && capabilities.getType() == ValueType.LONG) {
         return getLongPredicateMatcher(
             columnSelectorFactory.makeColumnValueSelector(columnName),
             predicateFactory.makeLongPredicate()
         );
       }
   
       final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector =
           DimensionHandlerUtils.createColumnSelectorPlus(
               ValueMatcherColumnSelectorStrategyFactory.instance(),
               DefaultDimensionSpec.of(columnName),
               columnSelectorFactory
           );
   
       return 
selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), 
predicateFactory);
     }
   ```
   
   The test Filter adds a row to the IncrementalIndex in the test when the 
predicateFactory creates a new String predicate, after the DimensionSelector is 
created.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to