[
https://issues.apache.org/jira/browse/DRILL-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586708#comment-16586708
]
Boaz Ben-Zvi commented on DRILL-6566:
-------------------------------------
Looks like as batch-sizing makes the #rows smaller, the estimate (used to
determine when to spill) still uses the original estimate (based on 64K rows
per batch). Need to check if this estimate needs to be updated.
> Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more
> nodes ran out of memory while executing the query. AGGR OOM at First Phase.
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-6566
> URL: https://issues.apache.org/jira/browse/DRILL-6566
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.14.0
> Reporter: Robert Hou
> Assignee: Boaz Ben-Zvi
> Priority: Critical
> Fix For: 1.15.0
>
> Attachments: drillbit.log.6566
>
>
> This is TPCDS Query 66.
> Query: tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1,
> Sum(jan_sales) AS jan_sales,
> Sum(feb_sales) AS feb_sales,
> Sum(mar_sales) AS mar_sales,
> Sum(apr_sales) AS apr_sales,
> Sum(may_sales) AS may_sales,
> Sum(jun_sales) AS jun_sales,
> Sum(jul_sales) AS jul_sales,
> Sum(aug_sales) AS aug_sales,
> Sum(sep_sales) AS sep_sales,
> Sum(oct_sales) AS oct_sales,
> Sum(nov_sales) AS nov_sales,
> Sum(dec_sales) AS dec_sales,
> Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
> Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
> Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
> Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
> Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
> Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
> Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
> Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
> Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
> Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
> Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
> Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
> Sum(jan_net) AS jan_net,
> Sum(feb_net) AS feb_net,
> Sum(mar_net) AS mar_net,
> Sum(apr_net) AS apr_net,
> Sum(may_net) AS may_net,
> Sum(jun_net) AS jun_net,
> Sum(jul_net) AS jul_net,
> Sum(aug_net) AS aug_net,
> Sum(sep_net) AS sep_net,
> Sum(oct_net) AS oct_net,
> Sum(nov_net) AS nov_net,
> Sum(dec_net) AS dec_net
> FROM (SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_year AS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS aug_sales,
> Sum(CASE
> WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS sep_sales,
> Sum(CASE
> WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS oct_sales,
> Sum(CASE
> WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS nov_sales,
> Sum(CASE
> WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
> ELSE 0
> END) AS dec_sales,
> Sum(CASE
> WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS jan_net,
> Sum(CASE
> WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS feb_net,
> Sum(CASE
> WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS mar_net,
> Sum(CASE
> WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS apr_net,
> Sum(CASE
> WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS may_net,
> Sum(CASE
> WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS jun_net,
> Sum(CASE
> WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS jul_net,
> Sum(CASE
> WHEN d_moy = 8 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS aug_net,
> Sum(CASE
> WHEN d_moy = 9 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS sep_net,
> Sum(CASE
> WHEN d_moy = 10 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS oct_net,
> Sum(CASE
> WHEN d_moy = 11 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS nov_net,
> Sum(CASE
> WHEN d_moy = 12 THEN ws_net_paid_inc_ship * ws_quantity
> ELSE 0
> END) AS dec_net
> FROM web_sales,
> warehouse,
> date_dim,
> time_dim,
> ship_mode
> WHERE ws_warehouse_sk = w_warehouse_sk
> AND ws_sold_date_sk = d_date_sk
> AND ws_sold_time_sk = t_time_sk
> AND ws_ship_mode_sk = sm_ship_mode_sk
> AND d_year = 1998
> AND t_time BETWEEN 7249 AND 7249 + 28800
> AND sm_carrier IN ( 'ZOUROS', 'ZHOU' )
> GROUP BY w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> d_year
> UNION ALL
> SELECT w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> 'ZOUROS'
> \|\| ','
> \|\| 'ZHOU' AS ship_carriers,
> d_year AS year1,
> Sum(CASE
> WHEN d_moy = 1 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS jan_sales,
> Sum(CASE
> WHEN d_moy = 2 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS feb_sales,
> Sum(CASE
> WHEN d_moy = 3 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS mar_sales,
> Sum(CASE
> WHEN d_moy = 4 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS apr_sales,
> Sum(CASE
> WHEN d_moy = 5 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS may_sales,
> Sum(CASE
> WHEN d_moy = 6 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS jun_sales,
> Sum(CASE
> WHEN d_moy = 7 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS jul_sales,
> Sum(CASE
> WHEN d_moy = 8 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS aug_sales,
> Sum(CASE
> WHEN d_moy = 9 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS sep_sales,
> Sum(CASE
> WHEN d_moy = 10 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS oct_sales,
> Sum(CASE
> WHEN d_moy = 11 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS nov_sales,
> Sum(CASE
> WHEN d_moy = 12 THEN cs_ext_sales_price * cs_quantity
> ELSE 0
> END) AS dec_sales,
> Sum(CASE
> WHEN d_moy = 1 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS jan_net,
> Sum(CASE
> WHEN d_moy = 2 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS feb_net,
> Sum(CASE
> WHEN d_moy = 3 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS mar_net,
> Sum(CASE
> WHEN d_moy = 4 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS apr_net,
> Sum(CASE
> WHEN d_moy = 5 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS may_net,
> Sum(CASE
> WHEN d_moy = 6 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS jun_net,
> Sum(CASE
> WHEN d_moy = 7 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS jul_net,
> Sum(CASE
> WHEN d_moy = 8 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS aug_net,
> Sum(CASE
> WHEN d_moy = 9 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS sep_net,
> Sum(CASE
> WHEN d_moy = 10 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS oct_net,
> Sum(CASE
> WHEN d_moy = 11 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS nov_net,
> Sum(CASE
> WHEN d_moy = 12 THEN cs_net_paid * cs_quantity
> ELSE 0
> END) AS dec_net
> FROM catalog_sales,
> warehouse,
> date_dim,
> time_dim,
> ship_mode
> WHERE cs_warehouse_sk = w_warehouse_sk
> AND cs_sold_date_sk = d_date_sk
> AND cs_sold_time_sk = t_time_sk
> AND cs_ship_mode_sk = sm_ship_mode_sk
> AND d_year = 1998
> AND t_time BETWEEN 7249 AND 7249 + 28800
> AND sm_carrier IN ( 'ZOUROS', 'ZHOU' )
> GROUP BY w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> d_year) x
> GROUP BY w_warehouse_name,
> w_warehouse_sq_ft,
> w_city,
> w_county,
> w_state,
> w_country,
> ship_carriers,
> year1
> ORDER BY w_warehouse_name
> LIMIT 100;
> Here is the error message with stack trace:
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory
> while executing the query.
> AGGR OOM at First Phase. Partitions: 1. Estimated batch size: 31260672.
> values size: 25165824. Output alloc size: 25165824. Planned batches: 1 Memory
> limit: 2302755 so far allocated: 262144.
> Fragment 6:1
> [Error Id: 660fd3f9-bb90-4245-b63c-208fc37c0199 on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First
> Phase. Partitions: 1. Estimated batch size: 31260672. values size: 25165824.
> Output alloc size: 25165824. Planned batches: 1 Memory limit: 2302755 so far
> allocated: 262144.
>
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.spillIfNeeded():1468
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.doSpill():1430
>
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.checkGroupAndAggrValues():1353
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.doWork():587
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():184
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> at
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
> at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:632)
> at
> oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:207)
> at
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:153)
> at
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:253)
> at
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException:
> RESOURCE ERROR: One or more nodes ran out of memory while executing the query.
> AGGR OOM at First Phase. Partitions: 1. Estimated batch size: 31260672.
> values size: 25165824. Output alloc size: 25165824. Planned batches: 1 Memory
> limit: 2302755 so far allocated: 262144.
> Fragment 6:1
> [Error Id: 660fd3f9-bb90-4245-b63c-208fc37c0199 on atsqa6c82.qa.lab:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First
> Phase. Partitions: 1. Estimated batch size: 31260672. values size: 25165824.
> Output alloc size: 25165824. Planned batches: 1 Memory limit: 2302755 so far
> allocated: 262144.
>
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.spillIfNeeded():1468
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.doSpill():1430
>
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.checkGroupAndAggrValues():1353
> org.apache.drill.exec.test.generated.HashAggregatorGen15454.doWork():587
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():184
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
> at
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
> at
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
> at
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
> at
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
> at
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
> at
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at
> oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
> at
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at
> oadd.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at
> oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
> at
> oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at
> oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
> at
> oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at
> oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at
> oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
> ... 1 more
> Here is the commit id:
> 1.14.0-SNAPSHOT 140d09e69b65ac2cb1bed09a37fa5861d39a99b3 DRILL-6539:
> Record count not set for this vector container error 28.06.2018 @
> 16:13:20 PDT Unknown 28.06.2018 @ 16:21:42 PDT
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)