[ 
https://issues.apache.org/jira/browse/DRILL-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073275#comment-17073275
 ] 

Paul Rogers edited comment on DRILL-7675 at 4/2/20, 1:11 AM:
-------------------------------------------------------------

On a Drill server, increase parallelism to 2. Query time decreases to 3 seconds.

{code:sql}
ALTER SYSTEM SET `planner.width.max_per_node`=2
{code}

Note that, according to the query profile, we don't actually create more 
fragments.

The hash partition senders now consume 145MB each, for two threads, for nearly 
300MB total: far larger than the actual data size.

At a parallelism of 3 hit the following error:

{noformat}
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR:
Not enough memory for internal partitioning and fallback mechanism for HashJoin 
to use unbounded memory is disabled. Either enable fallback config 
drill.exec.hashjoin.fallback.enabled using 
Alter session/system command or increase memory limit for Drillbit

Fragment: 3:1

...
 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.disableSpilling(HashJoinBatch.java:997)
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.partitionNumTuning(HashJoinBatch.java:968)
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:1057)
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:560)
...
{noformat}

The hash join does some memory calcs and vastly overestimates the memory 
required, resulting in the following warning in the log file:

{noformat}
When using the minimum number of partitions 2 we require 713 MB memory
but only have 682 MB available. Forcing legacy behavior of using
unbounded memory in order to prevent regressions.
{noformat}

The hash join then checks an option and emits the message shown above.

So, we have a number of problems:

1. The hash join vastly overestimated memory needs.
2. The problem only occurs as we increase parallelism, despite the fact that 
the planner did not actually creates extra fragments.

Let's do what the error message suggests:

{code:sql}
ALTER SESSION SET `drill.exec.hashjoin.fallback.enabled`=true
{code}

The query now passes in the debug environment with parallelism of 3. In the 
test case, was able to push parallelism to 8 (on a 8-core, 16 thread machine). 
Query run time decreases to 30 sec.

On a running Drillbit, parallelism of 3 reduces run time to 2.6 seconds.  
Parallelism of 4 gives 2.5 seconds. Increasing to parallelism of 8 provides no 
additional reduction of run time.


was (Author: paul.rogers):
Digging into the implementation now. When running in debug mode, parallelism of 
3, we get this stack trace:

{noformat}
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR:
Not enough memory for internal partitioning and fallback mechanism for HashJoin 
to use unbounded memory is disabled. Either enable fallback config 
drill.exec.hashjoin.fallback.enabled using 
Alter session/system command or increase memory limit for Drillbit

Fragment: 3:1

...
 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.disableSpilling(HashJoinBatch.java:997)
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.partitionNumTuning(HashJoinBatch.java:968)
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:1057)
        at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:560)
...
{noformat}

The hash join does some memory calcs and vastly overestimates the memory 
required, resulting in the following warning in the log file:

{noformat}
When using the minimum number of partitions 2 we require 713 MB memory
but only have 682 MB available. Forcing legacy behavior of using
unbounded memory in order to prevent regressions.
{noformat}

The hash join then checks an option and emits the message shown above.

Let's do what the error message suggests:

{code:sql}
ALTER SESSION SET `drill.exec.hashjoin.fallback.enabled`=true
{code}

The query now passes in the debug environment with parallelism of 3. In the 
test case, was able to push parallelism to 8 (on a 8-core, 16 thread machine). 
Query run time decreases to 30 sec.

On a running Drillbit, parallelism of 3 reduces run time to 2.6 seconds.  
Parallelism of 4 gives 2.5 seconds. Increasing to parallelism of 8 provides no 
additional reduction of run time.

> Very slow performance and Memory exhaustion while querying on very small 
> dataset of parquet files
> -------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7675
>                 URL: https://issues.apache.org/jira/browse/DRILL-7675
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization, Storage - Parquet
>    Affects Versions: 1.18.0
>         Environment: [^sample-dataset.zip]
>            Reporter: Idan Sheinberg
>            Assignee: Paul Rogers
>            Priority: Critical
>         Attachments: sample-dataset.zip
>
>
> Per our discussion in Slack/Dev-list Here are all details and sample data-set 
> to recreate problematic query behavior:
>  * We are using Drill 1.18.0-SNAPSHOT built on March 6
>  * We are joining on two small Parquet datasets residing on S3 using the 
> following query:
> {code:java}
> SELECT 
>  CASE
>  WHEN tbl1.`timestamp` IS NULL THEN tbl2.`timestamp`
>  ELSE tbl1.`timestamp`
>  END AS ts, *
>  FROM `s3-store.state.`/164` AS tbl1
>  FULL OUTER JOIN `s3-store.result`.`/164` AS tbl2
>  ON tbl1.`timestamp`*10 = tbl2.`timestamp`
>  ORDER BY ts ASC
>  LIMIT 500 OFFSET 0 ROWS
> {code}
>  * We are running drill in a single node setup on a 16 core, 64GB ram 
> machine. Drill heap size is set to 16GB, while max direct memory is set to 
> 32GB.
>  * As the dataset consist of really small files, Drill has been tweaked to 
> parallelize on small item count by tweaking the following variables:
> {code:java}
> planner.slice_target = 25
> planner.width.max_per_node = 16 (to match the core count){code}
>  * Without the above parallelization, query speeds on parquet files are super 
> slow (tens of seconds)
>  * While queries do work, we are seeing non-proportional direct memory/heap 
> utilization. (up 20GB of direct memory used, a min of 12GB heap required)
>  * We're still encountering the occasional OOM of memory error (we're also 
> seeing heap exhaustion, but I guess that's another indication to same 
> problem. Reducing the node parallelization width to say, 8, reduces memory 
> contention, though it still reaches 8 gb of direct memory 
> {code:java}
> User Error Occurred: One or more nodes ran out of memory while executing the 
> query. (null)
>  org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or 
> more nodes ran out of memory while executing the query.null[Error Id: 
> 67b61fc9-320f-47a1-8718-813843a10ecc ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:338)
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.drill.exec.exception.OutOfMemoryException: null
>  at 
> org.apache.drill.exec.vector.complex.AbstractContainerVector.allocateNew(AbstractContainerVector.java:59)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.allocateOutgoingRecordBatch(PartitionerTemplate.java:380)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:400)
>  at 
> org.apache.drill.exec.test.generated.PartitionerGen5.setup(PartitionerTemplate.java:126)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createClassInstances(PartitionSenderRootExec.java:263)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.createPartitioner(PartitionSenderRootExec.java:218)
>  at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:188)
>  at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
>  ... 4 common frames omitted{code}
> I've attached a (real!) sample data-set to match the query above. That same 
> dataset recreates the aforementioned memory behavior
> Help, please.
> Idan
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to