LakshSingla commented on code in PR #15020:
URL: https://github.com/apache/druid/pull/15020#discussion_r1339018552
##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -786,8 +789,44 @@ static Pair<DataSource, Filtration> getFiltration(
JoinableFactoryWrapper joinableFactoryWrapper
)
{
- if (!canUseIntervalFiltering(dataSource)) {
+ if (dataSource instanceof UnnestDataSource) {
+ // UnnestDataSource can have another unnest data source
+ // join datasource, filtered data source, etc as base
+ Pair<DataSource, Filtration> pair = getFiltration(
+ ((UnnestDataSource) dataSource).getBase(),
+ filter,
+ virtualColumnRegistry,
+ joinableFactoryWrapper
+ );
+ return Pair.of(dataSource, pair.rhs);
+ } else if (!canUseIntervalFiltering(dataSource)) {
return Pair.of(dataSource, toFiltration(filter,
virtualColumnRegistry.getFullRowSignature(), false));
+ } else if (dataSource instanceof FilteredDataSource) {
+ // A filteredDS is created only inside the rel for Unnest, ensuring it
only grabs the outermost filter
+ // and, if possible, pushes it down inside the data source.
+ // So a chain of Filter->Unnest->Filter is typically impossible when the
query is done through SQL.
+ // Also, Calcite has filter reduction rules that push filters deep into
base data sources for better query planning.
+ // Hence, the case that can be seen is a bunch of unnests followed by a
terminal filteredDS like Unnest->Unnest->FilteredDS.
+ // A base table with a chain of filters is synonymous with a filteredDS.
+ // In case there are filters present in the getFiltration call we still
update the interval by:
+ // 1) creating a filtration from the filteredDS's filter and
+ // 2) Updating the interval of the outer filter with the intervals in
step 1, and you'll see these 2 calls in the code
+ final FilteredDataSource filteredDataSource = (FilteredDataSource)
dataSource;
+ // Defensive check as in the base of a filter cannot be another filter
+ final DataSource baseOfFilterDataSource = filteredDataSource.getBase();
+ if (baseOfFilterDataSource instanceof FilteredDataSource) {
+ throw DruidException.defensive("Cannot create a filteredDataSource
using another filteredDataSource as a base");
+ }
Review Comment:
Instead of getBase(), we need to do a recursive check on the
`filteredDataSource.getChildren()`.
i.e. FilteredDS -> UnnestDS -> FilteredDS is disallowed ❌ .
We are only accounting for the top-level filtered DS, therefore any
intervals in any nested DS.
##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -786,8 +789,44 @@ static Pair<DataSource, Filtration> getFiltration(
JoinableFactoryWrapper joinableFactoryWrapper
)
{
- if (!canUseIntervalFiltering(dataSource)) {
+ if (dataSource instanceof UnnestDataSource) {
+ // UnnestDataSource can have another unnest data source
+ // join datasource, filtered data source, etc as base
+ Pair<DataSource, Filtration> pair = getFiltration(
+ ((UnnestDataSource) dataSource).getBase(),
+ filter,
+ virtualColumnRegistry,
+ joinableFactoryWrapper
+ );
+ return Pair.of(dataSource, pair.rhs);
+ } else if (!canUseIntervalFiltering(dataSource)) {
return Pair.of(dataSource, toFiltration(filter,
virtualColumnRegistry.getFullRowSignature(), false));
+ } else if (dataSource instanceof FilteredDataSource) {
+ // A filteredDS is created only inside the rel for Unnest, ensuring it
only grabs the outermost filter
+ // and, if possible, pushes it down inside the data source.
+ // So a chain of Filter->Unnest->Filter is typically impossible when the
query is done through SQL.
+ // Also, Calcite has filter reduction rules that push filters deep into
base data sources for better query planning.
+ // Hence, the case that can be seen is a bunch of unnests followed by a
terminal filteredDS like Unnest->Unnest->FilteredDS.
+ // A base table with a chain of filters is synonymous with a filteredDS.
+ // In case there are filters present in the getFiltration call we still
update the interval by:
+ // 1) creating a filtration from the filteredDS's filter and
+ // 2) Updating the interval of the outer filter with the intervals in
step 1, and you'll see these 2 calls in the code
+ final FilteredDataSource filteredDataSource = (FilteredDataSource)
dataSource;
+ // Defensive check as in the base of a filter cannot be another filter
+ final DataSource baseOfFilterDataSource = filteredDataSource.getBase();
+ if (baseOfFilterDataSource instanceof FilteredDataSource) {
+ throw DruidException.defensive("Cannot create a filteredDataSource
using another filteredDataSource as a base");
+ }
Review Comment:
Rethinking about it, I see that we are only handling the cases where
UnnestDS is at the top level. A query chain like QueryDS -> UnnestDS ->
FilteredDS -> .... won't set the intervals specified in the FilteredDS properly.
##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -786,8 +789,44 @@ static Pair<DataSource, Filtration> getFiltration(
JoinableFactoryWrapper joinableFactoryWrapper
)
{
- if (!canUseIntervalFiltering(dataSource)) {
+ if (dataSource instanceof UnnestDataSource) {
+ // UnnestDataSource can have another unnest data source
+ // join datasource, filtered data source, etc as base
+ Pair<DataSource, Filtration> pair = getFiltration(
+ ((UnnestDataSource) dataSource).getBase(),
+ filter,
+ virtualColumnRegistry,
+ joinableFactoryWrapper
+ );
+ return Pair.of(dataSource, pair.rhs);
+ } else if (!canUseIntervalFiltering(dataSource)) {
return Pair.of(dataSource, toFiltration(filter,
virtualColumnRegistry.getFullRowSignature(), false));
+ } else if (dataSource instanceof FilteredDataSource) {
+ // A filteredDS is created only inside the rel for Unnest, ensuring it
only grabs the outermost filter
+ // and, if possible, pushes it down inside the data source.
+ // So a chain of Filter->Unnest->Filter is typically impossible when the
query is done through SQL.
+ // Also, Calcite has filter reduction rules that push filters deep into
base data sources for better query planning.
+ // Hence, the case that can be seen is a bunch of unnests followed by a
terminal filteredDS like Unnest->Unnest->FilteredDS.
+ // A base table with a chain of filters is synonymous with a filteredDS.
+ // In case there are filters present in the getFiltration call we still
update the interval by:
+ // 1) creating a filtration from the filteredDS's filter and
+ // 2) Updating the interval of the outer filter with the intervals in
step 1, and you'll see these 2 calls in the code
Review Comment:
nit: push this branch above due to the same reason, perhaps we can move it
to another PR if there are no further comments.
##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -786,8 +789,44 @@ static Pair<DataSource, Filtration> getFiltration(
JoinableFactoryWrapper joinableFactoryWrapper
)
{
- if (!canUseIntervalFiltering(dataSource)) {
+ if (dataSource instanceof UnnestDataSource) {
+ // UnnestDataSource can have another unnest data source
+ // join datasource, filtered data source, etc as base
+ Pair<DataSource, Filtration> pair = getFiltration(
+ ((UnnestDataSource) dataSource).getBase(),
+ filter,
+ virtualColumnRegistry,
+ joinableFactoryWrapper
+ );
+ return Pair.of(dataSource, pair.rhs);
+ } else if (!canUseIntervalFiltering(dataSource)) {
return Pair.of(dataSource, toFiltration(filter,
virtualColumnRegistry.getFullRowSignature(), false));
+ } else if (dataSource instanceof FilteredDataSource) {
+ // A filteredDS is created only inside the rel for Unnest, ensuring it
only grabs the outermost filter
+ // and, if possible, pushes it down inside the data source.
+ // So a chain of Filter->Unnest->Filter is typically impossible when the
query is done through SQL.
+ // Also, Calcite has filter reduction rules that push filters deep into
base data sources for better query planning.
+ // Hence, the case that can be seen is a bunch of unnests followed by a
terminal filteredDS like Unnest->Unnest->FilteredDS.
+ // A base table with a chain of filters is synonymous with a filteredDS.
+ // In case there are filters present in the getFiltration call we still
update the interval by:
+ // 1) creating a filtration from the filteredDS's filter and
+ // 2) Updating the interval of the outer filter with the intervals in
step 1, and you'll see these 2 calls in the code
+ final FilteredDataSource filteredDataSource = (FilteredDataSource)
dataSource;
+ // Defensive check as in the base of a filter cannot be another filter
+ final DataSource baseOfFilterDataSource = filteredDataSource.getBase();
+ if (baseOfFilterDataSource instanceof FilteredDataSource) {
+ throw DruidException.defensive("Cannot create a filteredDataSource
using another filteredDataSource as a base");
+ }
Review Comment:
Instead of getBase(), we need to do a recursive check on the
`filteredDataSource.getChildren()`.
i.e. FilteredDS -> UnnestDS -> FilteredDS is disallowed ❌ .
We are only accounting for the top-level filtered DS, therefore any
intervals in any nested DS.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]