Rajesh Balamohan created HIVE-27309:
---------------------------------------

             Summary: Large number of partitions and small files causes OOM in 
query coordinator
                 Key: HIVE-27309
                 URL: https://issues.apache.org/jira/browse/HIVE-27309
             Project: Hive
          Issue Type: Improvement
          Components: Iceberg integration
            Reporter: Rajesh Balamohan


 When large number of nested partitions (with small files) are read, AM bails 
out with OOM.



{noformat}
CREATE EXTERNAL TABLE `store_sales_delete_6`(
  `ss_sold_time_sk` int,
  `ss_item_sk` int,
  `ss_customer_sk` int,
  `ss_cdemo_sk` int,
  `ss_hdemo_sk` int,
  `ss_addr_sk` int,
  `ss_store_sk` int,
  `ss_promo_sk` int,
  `ss_ticket_number` bigint,
  `ss_quantity` int,
  `ss_wholesale_cost` decimal(7,2),
  `ss_list_price` decimal(7,2),
  `ss_sales_price` decimal(7,2),
  `ss_ext_discount_amt` decimal(7,2),
  `ss_ext_sales_price` decimal(7,2),
  `ss_ext_wholesale_cost` decimal(7,2),
  `ss_ext_list_price` decimal(7,2),
  `ss_ext_tax` decimal(7,2),
  `ss_coupon_amt` decimal(7,2),
  `ss_net_paid` decimal(7,2),
  `ss_net_paid_inc_tax` decimal(7,2),
  `ss_net_profit` decimal(7,2),
  `ss_sold_date_sk` int)
PARTITIONED BY SPEC (
ss_store_sk, ss_promo_sk, ss_sold_date_sk) STORED by iceberg LOCATION 
's3a://blah/blah/tablespace/external/hive/blah.db/store_sales_delete_6';
alter table store_sales_delete_6 set tblproperties('format'='iceberg/parquet');
alter table store_sales_delete_6 set tblproperties('format-version'='2');insert 
into store_sales_delete_6 select * from tpcds_1000_update.ssv limit 100000;;


select count(*) from store_sales_delete_6;

{noformat}

Now, select count query throws OOM in query AM.  This query generates 100,000 
splits which are grouped together into 41 splits. But streaming this and 
sending as events throws OOM.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to