[
https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889294#comment-15889294
]
ASF GitHub Bot commented on DRILL-5304:
---------------------------------------
GitHub user ppadma opened a pull request:
https://github.com/apache/drill/pull/766
DRILL-5304: Queries fail intermittently when there is skew in data di…
…stribution
Change the assignment logic so we first make sure we assign up to minCount
for all nodes before going up to maxCount per node.
Also, fixed a small issue in parallelization code where we are rounding
down the calculation of number of fragments to run on nodes with affinity,
because of which, sometimes, we schedule less fragments on nodes with affinity
vs. nodes without affinity.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ppadma/drill DRILL-5304
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/766.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #766
----
commit 49cf9f0b54d8c0ea15c3d6a59f99b8e23870104e
Author: Padma Penumarthy <[email protected]>
Date: 2017-02-28T02:32:24Z
DRILL-5304: Queries fail intermittently when there is skew in data
distribution
----
> Queries fail intermittently when there is skew in data distribution
> -------------------------------------------------------------------
>
> Key: DRILL-5304
> URL: https://issues.apache.org/jira/browse/DRILL-5304
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.10.0
> Reporter: Abhishek Girish
> Assignee: Padma Penumarthy
> Attachments: query1_drillbit.log.txt, query2_drillbit.log.txt
>
>
> In a distributed environment, we've observed certain queries to fail
> execution intermittently, with an assignment logic issue, when the underlying
> data is skewed w.r.t distribution.
> For example the TPC-H [query
> 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q]
> failed with the below error:
> {code}
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException:
> MinorFragmentId 105 has no read entries assigned
> ...
> (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception
> during fragment initialization: MinorFragmentId 105 has no read entries
> assigned
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
> Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no
> read entries assigned
> {code}
> Log containing full stack trace is attached.
> And for this query, the underlying TPC-H SF100 Parquet dataset was observed
> to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data
> distribution skew on this cluster is most likely the triggering factor for
> this case, as the same query, on the same dataset does not show this failure
> on a different test cluster (with possibly different data distribution).
> Also, another
> [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql]
> failed with a similar error when slice target was set to 1.
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException:
> MinorFragmentId 66 has no read entries assigned
> ...
> (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception
> during fragment initialization: MinorFragmentId 66 has no read entries
> assigned
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)