[
https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429636#comment-16429636
]
Paul Rogers commented on DRILL-6312:
------------------------------------
Drill supports maps and arrays (including arrays of maps). Drill's current
SELECT syntax does not support these constructs well. Suppose my pesky field is
nested inside map m:
{noformat}
{m: {a: null}}
{noformat}
I need to express the type of a. The following will not work in Drill today:
{noformat}
SELECT CAST(m.a AS VARCHAR) FROM ...
{noformat}
This sets the type of {{m.a}}, but it also puts {{m.a}} into the projection
list as a top-level column. That is, it destroys the map structure. One would
not be able to even do this if {{m}} where an array of maps.
This is where the cast idea really fails to be general: the syntax of SQL just
does not allow us to reach down inside a map.
But, a separate hint does not have this problem. Using the made-up syntax from
above:
{noformat}
SELECT m FROM myFile WITH HINTS (m.a AS VARCHAR)
{noformat}
And, of a separate metadata hint file can be designed to handle any kind of
structures: maps, arrays, arrays of maps, and so on.
Conclusion: the cast mechanism is good and should be added. But, the hint or
metadata mechanism is still required in the general case.
> Enable pushing of cast expressions to the scanner for better schema discovery.
> ------------------------------------------------------------------------------
>
> Key: DRILL-6312
> URL: https://issues.apache.org/jira/browse/DRILL-6312
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators, Query Planning &
> Optimization
> Affects Versions: 1.13.0
> Reporter: Hanumath Rao Maduri
> Priority: Major
>
> Drill is a schema less engine which tries to infer the schema from disparate
> sources at the read time. Currently the scanners infer the schema for each
> batch depending upon the data for that column in the corresponding batch.
> This solves many uses cases but can error out when the data is too different
> between batches like int and array[int] etc... (There are other cases as well
> but just to give one example).
> There is also a mechanism to create a view by type casting the columns to
> appropriate type. This solves issues in some cases but fails in many other
> cases. This is due to the fact that cast expression is not being pushed down
> to the scanner but staying at the project or filter etc operators up the
> query plan.
> This JIRA is to fix this by propagating the type information embedded in the
> cast function to the scanners so that scanners can cast the incoming data
> appropriately.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)