[GitHub] [incubator-druid] jon-wei opened pull request #6340: Fix dictionary ID race condition in IncrementalIndexStorageAdapter

GitHub Mon, 17 Sep 2018 17:48:48 -0700

Possibly fixes https://github.com/apache/incubator-druid/issues/4937


--------

There is currently a race condition in IncrementalIndexStorageAdapter that can 
lead to exceptions like the following, when running queries with filters on 
String dimensions that hit realtime tasks: 

```
org.apache.druid.java.util.common.ISE: id[5] >= maxId[5]
        at 
org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:591)
        at 
org.apache.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector$2.matches(StringDimensionIndexer.java:562)
        at 
org.apache.druid.segment.incremental.IncrementalIndexStorageAdapter$IncrementalIndexCursor.advance(IncrementalIndexStorageAdapter.java:284)
```

When the `filterMatcher` is created in the constructor of 
`IncrementalIndexStorageAdapter.IncrementalIndexCursor`, 
`StringDimensionIndexer.makeDimensionSelector` gets called eventually, which 
calls:

```
final int maxId = getCardinality();
...

 @Override
  public int getCardinality()
  {
    return dimLookup.size();
  }
```

So `maxId` is set to the size of the dictionary at the time that the 
`filterMatcher` is created.

However, the `maxRowIndex` which is meant to prevent the Cursor from returning 
rows that were added after the Cursor was created (see 
https://github.com/apache/incubator-druid/pull/4049) is set after the 
`filterMatcher` is created.

If rows with new dictionary values are added after the `filterMatcher` is 
created but before `maxRowIndex` is set, then it is possible for the Cursor to 
return rows that contain the new values, which will have `id >= maxId`.

This PR sets `maxRowIndex` before creating the `filterMatcher` to prevent rows 
with unknown dictionary IDs from being passed to the `filterMatcher`.

-----------

The included test triggers the error with a custom Filter + 
DruidPredicateFactory.

The DimensionSelector for predicate-based filter matching is created here in 
`Filters.makeValueMatcher`:

```
  public static ValueMatcher makeValueMatcher(
      final ColumnSelectorFactory columnSelectorFactory,
      final String columnName,
      final DruidPredicateFactory predicateFactory
  )
  {
    final ColumnCapabilities capabilities = 
columnSelectorFactory.getColumnCapabilities(columnName);

    // This should be folded into the ValueMatcherColumnSelectorStrategy once 
that can handle LONG typed columns.
    if (capabilities != null && capabilities.getType() == ValueType.LONG) {
      return getLongPredicateMatcher(
          columnSelectorFactory.makeColumnValueSelector(columnName),
          predicateFactory.makeLongPredicate()
      );
    }

    final ColumnSelectorPlus<ValueMatcherColumnSelectorStrategy> selector =
        DimensionHandlerUtils.createColumnSelectorPlus(
            ValueMatcherColumnSelectorStrategyFactory.instance(),
            DefaultDimensionSpec.of(columnName),
            columnSelectorFactory
        );

    return 
selector.getColumnSelectorStrategy().makeValueMatcher(selector.getSelector(), 
predicateFactory);
  }
```

The test Filter adds a row to the IncrementalIndex in the test when the 
predicateFactory creates a new String predicate, after the DimensionSelector is 
created.


[ Full content available at: 
https://github.com/apache/incubator-druid/pull/6340 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [incubator-druid] jon-wei opened pull request #6340: Fix dictionary ID race condition in IncrementalIndexStorageAdapter

Reply via email to