[
https://issues.apache.org/jira/browse/DRILL-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677656#comment-16677656
]
Aman Sinha commented on DRILL-6829:
-----------------------------------
[~Paul.Rogers] I want to clarify .. by 'union' of two incompatible schemas I
did not mean using the union type. I meant the union like operation that we
normally do for record batches. Step #7 in my first comment is about doing
this cross-schema-union. Suppose there are 3 record batches each with
different schema for the sort key. These will be sitting in separate internal
queues of the blocking operator and each will be individually sorted. The
cross-schema-union will traverse these queues in a certain order (e.g all
Numeric types appear first, followed by all String types, followed by Date
types) consuming all batches from the first queue and emitting them, followed
by second queue and so on.
> Handle schema change in ExternalSort
> ------------------------------------
>
> Key: DRILL-6829
> URL: https://issues.apache.org/jira/browse/DRILL-6829
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Aman Sinha
> Priority: Major
>
> While we continue to enhance the schema provision and metastore aspects in
> Drill, we also should explore what it means to be truly schema-less such that
> we can better handle \{semi, un}structured data, data sitting in DBs that
> store JSON documents (e.g Mongo, MapR-DB).
>
> The blocking operators are the main hurdles in this goal (other operators
> also need to be smarter about this but the problem is harder for the blocking
> operators). This Jira is specifically about ExternalSort.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)