[ 
https://issues.apache.org/jira/browse/DRILL-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421091#comment-16421091
 ] 

Paul Rogers commented on DRILL-6223:
------------------------------------

[~sachouche], thanks for the explanation, very helpful. The tests will help 
clarify the original problem and the fix.

Looking at the code, it does appear we try to prune unused columns (there are 
references to used columns; which I naively assumed meant we are separating the 
used from unused, perhaps I'm wrong.)

If we cannot correctly handle a schema change (according to whatever semantics 
we decide we want), then we need to kill the query rather than produce invalid 
results.

On the dynamically adding columns: a careful reading will show that the 
suggestion is to *preserve* columns, not create them. The discussion was around 
when we can preserve columns (columns appeared in first batch, then 
disappeared) and when we can't (columns appear in second or later batch.)

This PR will be solid if we do three things:

* Avoid memory corruption (the primary goal here, and a good one)
* Add unit tests that verify the fix
* Avoid introducing new semantics (dropping columns) as that just digs us 
deeper into the schema-free mess. Instead, fail the query if we are given 
schemas we can't reconcile.


> Drill fails on Schema changes 
> ------------------------------
>
>                 Key: DRILL-6223
>                 URL: https://issues.apache.org/jira/browse/DRILL-6223
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0, 1.12.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Drill Query Failing when selecting all columns from a Complex Nested Data 
> File (Parquet) Set). There are differences in Schema among the files:
>  * The Parquet files exhibit differences both at the first level and within 
> nested data types
>  * A select * will not cause an exception but using a limit clause will
>  * Note also this issue seems to happen only when multiple Drillbit minor 
> fragments are involved (concurrency higher than one)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to