On Sun, Apr 8, 2018 at 10:57 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> I have been thinking about this email and I still don't understand some of > the comments. > > On Fri, Apr 6, 2018 at 5:13 PM, Aman Sinha <amansi...@apache.org> wrote: > > > On the subject of CAST pushdown to Scans, there are potential drawbacks > > ... > > > > - In general, the planner will see a Scan-Project where the Project > has > > CAST functions. But the Project can have arbitrary expressions, e.g > > CAST(a as INT) * 5 or a combination of 2 CAST functions or non-CAST > > functions etc. It would be quite expensive to examine each > expression > > (there could be hundreds) to determine whether it is eligible to be > > pushed > > to the Scan. > > > > How is this different than filter and project pushdown? There could be > hundreds of those and it could be difficult for Calcite to find appropriate > pushdowns. But I have never heard of any problem. > > - the traversal of all expressions is already required and already done in > order to find the set of columns that are being extracted. As such, cast > pushdown can be done in the same motions as project pushdown. > It is true that the amount of work done by the planner would be about the same as when determining projection pushdowns into the scan. In my mind I was contrasting with the pure DDL based approach with an explicitly specified schema (such as with a 'CREATE EXTERNAL TABLE ...' or with per query hints as Paul mentioned). However, in the absence of those, I agree that it would be a win to do the 'simple' CAST pushdowns, keeping in mind that the same column may be referenced in multiple ways: e.g CAST(a as varchar(10)), CAST(a as varchar(20)) in the same query/view. In such cases, we would want to either not do the pushdown or determine the highest common datatype and push that down. All of this, though, does not preclude the real need for the 'source of truth' of the schema for the cases where data has been already explored and curated. We do want to have a solution for that core issue. -Aman