[
https://issues.apache.org/jira/browse/DRILL-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677572#comment-16677572
]
Paul Rogers commented on DRILL-6829:
------------------------------------
[~amansinha100], thanks for the explanation. A couple of observations. First,
Drill is a relational engine, clients are often JDBC or ODBC. Such clients
cannot handle a schema change. (Of course, the Drill client is more flexible,
so it certainly an handle schema changes.)
Second, the union type has never really worked. There is no support for it in
JDBC or ODBC. So, it would be a "Drill-client-only" solution. That may or not
be bad depending on Drill's target user base.
There is now overwhelming evidence that for non Mongo data sources, that there
is no way to achieve a reliable schema incrementally when data is delivered in
random order.
So, maybe divide the problem into two parts. The schema mechanism for those
users that use xDBC. And something clever like what is suggested here for those
users of Mongo that use the Drill client and can absorb varying schemas. (Other
DB's have this same property, including MapR DB JSON IIRC.)
My experience is with the uses and users of xDBC and similar interfaces. I
don't know of any users of the raw Drill client, but I suppose they could
exist...
In any event, rather than debate the topic to death, just go ahead and work out
what happens when there are many files, scanned on many nodes, in random order,
with each supported kind of schema change. It is very hard for any relational
engine to make sense as the schema changes randomly across runs (because of the
random scan order.) Work through those cases in detail and you'll go into this
with your eyes wide open about what can actually be done in practice.
> Handle schema change in ExternalSort
> ------------------------------------
>
> Key: DRILL-6829
> URL: https://issues.apache.org/jira/browse/DRILL-6829
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Aman Sinha
> Priority: Major
>
> While we continue to enhance the schema provision and metastore aspects in
> Drill, we also should explore what it means to be truly schema-less such that
> we can better handle \{semi, un}structured data, data sitting in DBs that
> store JSON documents (e.g Mongo, MapR-DB).
>
> The blocking operators are the main hurdles in this goal (other operators
> also need to be smarter about this but the problem is harder for the blocking
> operators). This Jira is specifically about ExternalSort.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)