[ 
https://issues.apache.org/jira/browse/DRILL-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677245#comment-16677245
 ] 

Aman Sinha commented on DRILL-6829:
-----------------------------------

Hi [~Paul.Rogers], the fundamental assumption you are making is that all input 
types need to be converted to a common type in order to generate a global sort. 
 But let's drop the requirement of a global sort for the moment.  If you 
examine the approach I described, I am proposing that each distinct schema be 
sorted independently and only compatible types need to be merged together to 
produce a common output schema.  Incompatible types just need a 'union'.  

It just occurred to me to check MongoDB's sort behavior and it does not seem 
much different from what I am proposing. Please see [1].   So, if a user is 
querying NoSQL DB such as Mongo, their client applications already must be 
aware of schema changes.  How  can we provide the same functionality through 
SQL ?  Since Drill's key differentiator is to provide a SQL capability to a 
wide variety of data sources, this seems like a must-have capability otherwise 
why would a Mongo user migrate to Drill+Mongo despite the enormous benefits of 
using a SQL interface ?   (BTW, I am just using MongoDB as an example here). 

That said, I am under no illusion that it is a much more complex task to do 
this in Drill.  But, let's chip away the layers and see what is feasible.  

[1] 
https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#bson-types-comparison-order

> Handle schema change in ExternalSort
> ------------------------------------
>
>                 Key: DRILL-6829
>                 URL: https://issues.apache.org/jira/browse/DRILL-6829
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Aman Sinha
>            Priority: Major
>
> While we continue to enhance the schema provision and metastore aspects in 
> Drill, we also should explore what it means to be truly schema-less such that 
> we can better handle \{semi, un}structured data, data sitting in DBs that 
> store JSON documents (e.g Mongo, MapR-DB). 
>  
> The blocking operators are the main hurdles in this goal (other operators 
> also need to be smarter about this but the problem is harder for the blocking 
> operators).   This Jira is specifically about ExternalSort. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to