[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575357#comment-16575357
 ] 

ASF GitHub Bot commented on DRILL-6453:
---------------------------------------

ilooner commented on a change in pull request #1408: DRILL-6453: Resolve 
deadlock when reading from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#discussion_r209066765
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##########
 @@ -420,31 +357,32 @@ public long getMaxReservedMemory() {
     private void calculateMemoryUsage()
     {
       // Adjust based on number of records
-      maxBuildBatchSize = computeMaxBatchSizeNoHash(buildBatchSize, 
buildNumRecords,
-        maxBatchNumRecordsBuild, fragmentationFactor, safetyFactor);
-      maxProbeBatchSize = computeMaxBatchSizeNoHash(probeBatchSize, 
probeNumRecords,
-        maxBatchNumRecordsProbe, fragmentationFactor, safetyFactor);
-
-      // Safety factor can be multiplied at the end since these batches are 
coming from exchange operators, so no excess value vector doubling
-      partitionBuildBatchSize = computeMaxBatchSize(buildBatchSize,
-        buildNumRecords,
-        recordsPerPartitionBatchBuild,
-        fragmentationFactor,
-        safetyFactor,
-        reserveHash);
+      maxBuildBatchSize = 
buildSizePredictor.predictBatchSize(maxBatchNumRecordsBuild, false);
 
-      // Safety factor can be multiplied at the end since these batches are 
coming from exchange operators, so no excess value vector doubling
-      partitionProbeBatchSize = computeMaxBatchSize(
-        probeBatchSize,
-        probeNumRecords,
-        recordsPerPartitionBatchProbe,
-        fragmentationFactor,
-        safetyFactor,
-        reserveHash);
+      if (probeSizePredictor.hasData()) {
+        // We have probe data and we can compute the max incoming size.
+        maxProbeBatchSize = 
probeSizePredictor.predictBatchSize(maxBatchNumRecordsProbe, false);
+      } else {
+        // We don't have probe data
+        if (probeEmpty) {
+          // We know the probe has no data, so we don't need to reserve any 
space for the incoming probe
+          maxProbeBatchSize = 0;
+        } else {
+          // The probe side may have data, so assume it is the max incoming 
batch size. This assumption
+          // can fail in some cases since the batch sizing project is 
incomplete.
+          maxProbeBatchSize = maxIncomingBatchSize;
+        }
+      }
+
+      partitionBuildBatchSize = 
buildSizePredictor.predictBatchSize(recordsPerPartitionBatchBuild, reserveHash);
+
+      if (probeSizePredictor.hasData()) {
+        partitionProbeBatchSize = 
probeSizePredictor.predictBatchSize(recordsPerPartitionBatchProbe, reserveHash);
+      }
 
       maxOutputBatchSize = (long) ((double)outputBatchSize * 
fragmentationFactor * safetyFactor);
 
-      long probeReservedMemory;
+      long probeReservedMemory = -1;
 
 Review comment:
   Removed -1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> TPC-DS query 72 has regressed
> -----------------------------
>
>                 Key: DRILL-6453
>                 URL: https://issues.apache.org/jira/browse/DRILL-6453
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.14.0
>            Reporter: Khurram Faraaz
>            Assignee: Timothy Farkas
>            Priority: Blocker
>             Fix For: 1.15.0
>
>         Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to