[
https://issues.apache.org/jira/browse/DRILL-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677245#comment-16677245
]
Aman Sinha commented on DRILL-6829:
-----------------------------------
Hi [~Paul.Rogers], the fundamental assumption you are making is that all input
types need to be converted to a common type in order to generate a global sort.
But let's drop the requirement of a global sort for the moment. If you
examine the approach I described, I am proposing that each distinct schema be
sorted independently and only compatible types need to be merged together to
produce a common output schema. Incompatible types just need a 'union'.
It just occurred to me to check MongoDB's sort behavior and it does not seem
much different from what I am proposing. Please see [1]. So, if a user is
querying NoSQL DB such as Mongo, their client applications already must be
aware of schema changes. How can we provide the same functionality through
SQL ? Since Drill's key differentiator is to provide a SQL capability to a
wide variety of data sources, this seems like a must-have capability otherwise
why would a Mongo user migrate to Drill+Mongo despite the enormous benefits of
using a SQL interface ? (BTW, I am just using MongoDB as an example here).
That said, I am under no illusion that it is a much more complex task to do
this in Drill. But, let's chip away the layers and see what is feasible.
[1]
https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#bson-types-comparison-order
> Handle schema change in ExternalSort
> ------------------------------------
>
> Key: DRILL-6829
> URL: https://issues.apache.org/jira/browse/DRILL-6829
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Aman Sinha
> Priority: Major
>
> While we continue to enhance the schema provision and metastore aspects in
> Drill, we also should explore what it means to be truly schema-less such that
> we can better handle \{semi, un}structured data, data sitting in DBs that
> store JSON documents (e.g Mongo, MapR-DB).
>
> The blocking operators are the main hurdles in this goal (other operators
> also need to be smarter about this but the problem is harder for the blocking
> operators). This Jira is specifically about ExternalSort.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)