Hi François

This is quite intriguing. I believe that one difference in execution conditions between the small and large queues is the amount memory that the query will be allowed. It could be interesting to see if the problem is connected to this difference by setting exec.queue.memory_ratio to 1.0. Enabling debug logging in logback.xml might also reveal more about what differs between these two executions of the same plan.

Regards
James

On 2022/08/25 18:13, François Méthot wrote:
Hi,

I am looking for an explanation to a workaround I have found to an issue
that has been bugging my team for the past weeks.

Last June we moved from Drill 1.12 to Drill 1.19... Long overdue upgrade!
Not long after we started getting the issue described  below.

We run a query daily on about 410GB of text data spread over  ~2200 files,
it has a cost of ~22 Billions queued as Large query
When the same query runs on 200MB spread over 130 files  (same data
format), with cost of 36 Millions queued as Large query, it never completes.

The small dataset query would stop doing any progress after a few minutes,
leaving on for hours, no progress, never complete.

The last running fragment is a HASH_PARTITION_SENDER showing 100% fragment
time.

After much shoot in the dark debugging session, analysing the src data etc.

We reviewed our cluster configuration,
when we changed
      exec.queue.threshold=30000000 to 60000000
to categorize our 200MB dataset as a small query,

The small dataset query started to work consistently in less than 10
seconds.

The physical plan is identical whether the query is Large or Small.

Is there a difference internally in drill execution whether the query is
small or large?
Would you be able to provide an explanation why this workaround works?

Cluster detail:
8 drillbits running in kubernetes
16GB direct mem
4GB Heap
data stored on a very efficient nfs, exposed as k8s pv/pvc to drillbit pods.

Thanks for any insight you can provide on that setting or in regards to our
initial problem.
François


Reply via email to