[
https://issues.apache.org/jira/browse/DERBY-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749952#action_12749952
]
Rick Hillegas commented on DERBY-4357:
--------------------------------------
Thanks for the quick feedback, Knut.
>1) It wasn't entirely clear to me how the _nullEqualsNull field in
>Restriction.ColumnQualifier is supposed to be used. Is the user supposed to
>specify somehow that this VTI treats NULLs in a non-standard way? If so, how
>(and why)? Or is it meant to implement IS (NOT) NULL restrictions? Or
>something else?
It is meant to implement IS NULL. I will clarify this.
>2) The operator '<>' is not defined. Is that intentional?
Yes. The idea is to model the Qualifier functionality which the optimizer
already understands. The comparisons are the same ones found in
org.apache.derby.iapi.types.Orderable.
>3) The array passed to setMaterializedColumnNames() contains somewhat
>redundant information (both names and positions of the columns to
>materialize). Would an array of booleans suffice? Or could the extra
>information be used for something?
I agree that the information is redundant and my original suggestion was to use
a bitmap. However, Chris reported that this would be cumbersome to use in the
real world. I don't think that the redundancy is harmful. In the meantime, I
have warmed up to the redundancy. That's because I can see that it will make it
very easy to write a generic table function that wraps a SELECT against a
foreign table and allows you to push projections and restrictions into the
foreign query.
4) I cannot find that the spec says what happens if the restricted VTI returns
non-qualifying rows. Should the restrictions be reevaluated outside the VTI? (I
think I would prefer that the restrictions were only seen as hints, so that the
VTI doesn't have to check all of them. Then, if there's a restriction on a
column that doesn't really help the VTI retrieving the rows more efficiently,
it could just disregard that restriction and let the SQL do the filtering
instead.)
The intent is that the table function has signed up for a contract to filter
rows. If the table function doesn't fulfill its contract, then the table
function has a bug. I will clarify this in the spec. I do see your point that
redundant enforcement of the qualification in Derby could be useful to some
users. I'm prepared to revisit this part of the spec during implementation if
it turns out that the query processor already redundantly enforces restrictions
even after pushing them into the store layer.
The reason that I prefer to regard the filtering as a contract is that I think
it will make it easier to reason about these smart table functions when we make
them even smarter: if a table function claims that it can perform a piece of
processing, then Derby should be able to rely on that claim. I'm thinking that
at some point we will want to add a similar mechanism so that a table function
can declare that it can return rows in certain sort orders. I think that if
Derby asks a table function to sort the rows, then Derby doesn't want to
perform a redundant sort outside the table function.
Thanks!
> TableFunctions provide no information to limit underlying query
> ---------------------------------------------------------------
>
> Key: DERBY-4357
> URL: https://issues.apache.org/jira/browse/DERBY-4357
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0
> Environment: ALL
> Reporter: Chris Goodacre
> Attachments: RestrictedTableFunctions.html
>
>
> The API specification for TableFunctions cannot provide information to the
> implementer of the TableFunction about the details of the query. For
> example:
> (a) I defined a table function named MyFunction with columns a,b, & c
> (b) I bind the table function properly using the CREATE FUNCTION SQL.
> User executes the following SQL:
> select a,b from table ( MyFunction() ) where c = 123
> Without passing the column list and/or where clause as arguments to the table
> function, my implementation can not know that it only needs two of the three
> columns, and only rows where c = 123.
> For TableFunctions that are built to integrate distant/legacy data, the cost
> of the query can be prohibitive. It would be better if information
> regarding the columns in the select and restrictions from the where clause
> could be passed to the developer.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.