GitHub user sachouche opened a pull request: https://github.com/apache/drill/pull/1170
DRILL-6223: Fixed several Drillbit failures due to schema changes Fixed several Issues due to Schema changes: 1) Changes in complex data types Drill Query Failing when selecting all columns from a Complex Nested Data File (Parquet) Set). There are differences in Schema among the files: The Parquet files exhibit differences both at the first level and within nested data types A select * will not cause an exception but using a limit clause will Note also this issue seems to happen only when multiple Drillbit minor fragments are involved (concurrency higher than one) 2) Dangling columns (both simple and complex) This situation can be easily reproduced for: - Select STAR queries which involve input data with different schemas - LIMIT or / and PROJECT operators are used - The data will be read from more than one minor fragment - This is because individual readers have logic to handle such use-cases but not downstream operators - So is reader-1 sends one batch with F1, F2, and F3 - The reader-2 sends batch F2, F3 - Then the LIMIT and PROJECT operator will fail to cleanup the dangling column F1 which will cause failures when downstream operators copy logic attempts copy the stale column F1 - This pull request adds logic to detect and eliminate dangling columns You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachouche/drill DRILL-6223 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1170 ---- commit d986b6c7588c107bb7e49d2fc8eb3f25a60e1214 Author: Salim Achouche <sachouche2@...> Date: 2018-02-21T02:17:14Z DRILL-6223: Fixed several Drillbit failures due to schema changes ---- ---