You can adjust async_pending_load_task_pool_size = 250 and async_loading_load_task_pool_size=250 in fe.conf and have a try.
On Fri, Mar 10, 2023 at 8:49 PM Nikita Sakharin <nikitasa1...@gmail.com> wrote: > Hi, Apache Doris Community! > > I'm a beginner user of Doris and I'm experiencing performance issues. > > I have *4 hosts for BE: Intel Xeon 80 cores, 768GB RAM each.* Frontend and > broker were deployed on a different machine with the same configuration. > All backends have "*mix*" as *Node Role*. > Doris version is doris-1.2.2-rc01. > Apache Hive cluster with HDFS throughput about 1 GB/sec. Replication factor > is 3 > TPCDS benchmark <https://github.com/databricks/spark-sql-perf> was used: > 330 GB of parquet data, snappy compression. Total size (with replication) > is 1TB. > > The default configuration of Apache Doris was used except these parameters: > ADMIN SET FRONTEND CONFIG('broker_load_default_timeout_second' = '21600'); > ADMIN SET FRONTEND CONFIG('default_db_max_running_txn_num' = '320'); > ADMIN SET FRONTEND CONFIG('default_load_parallelism' = '80'); > ADMIN SET FRONTEND CONFIG('default_max_query_instances' = '640'); > ADMIN SET FRONTEND CONFIG('disable_balance' = 'false'); > ADMIN SET FRONTEND CONFIG('disable_disk_balance' = 'false'); > ADMIN SET FRONTEND CONFIG('max_balancing_tablets' = '320'); > ADMIN SET FRONTEND CONFIG('max_broker_concurrency' = '80'); > ADMIN SET FRONTEND CONFIG('max_bytes_per_broker_scanner' = '68719476736'); > ADMIN SET FRONTEND CONFIG('max_routine_load_job_num' = '320'); > ADMIN SET FRONTEND CONFIG('max_routine_load_task_num_per_be' = '80'); > ADMIN SET FRONTEND CONFIG('max_routine_load_task_concurrent_num' = '320'); > ADMIN SET FRONTEND CONFIG('max_running_txn_num_per_db' = '320'); > ADMIN SET FRONTEND CONFIG('max_scheduling_tablets' = '20480'); > ADMIN SET FRONTEND CONFIG('min_bytes_per_broker_scanner' = '1073741824'); > ADMIN SET FRONTEND CONFIG('remote_fragment_exec_timeout_ms' = '60000'); > ADMIN SET FRONTEND CONFIG('tablet_create_timeout_second' = '60'); > ADMIN SET FRONTEND CONFIG('tablet_delete_timeout_second' = '60'); > > SET GLOBAL cpu_resource_limit = 80; > SET GLOBAL enable_exchange_node_parallel_merge = true; > SET GLOBAL enable_profile = true; > SET GLOBAL enable_vectorized_engine = true; > SET GLOBAL exec_mem_limit = 768G; > SET GLOBAL parallel_exchange_instance_num = 80; > SET GLOBAL parallel_fragment_exec_instance_num = 80; > SET GLOBAL query_timeout = 86400; > SET GLOBAL send_batch_parallelism = 80; > > SET PROPERTY 'max_query_instances' = '640'; > SET PROPERTY 'max_user_connections' = '1024'; > > External tables were created in Apache Doris using commands: > CREATE EXTERNAL TABLE store_returns( > sr_return_time_sk INT, > sr_item_sk INT, > ... > sr_net_loss DECIMAL(7,2) > ) ENGINE=HIVE > COMMENT 'HIVE' > PROPERTIES( > 'hive.metastore.uris' = 'thrift://xyz.com:9085', > 'database' = 'tpcds_1000', > 'table' = 'store_returns' > ); > The table in Apache Hive is partitioned by sr_return_date_sk field. > > Importing directly from HDFS utilizes *259 instances on four B**Es*: > 66 on first BE, > 65 on second BE, > 64 and 64 on third and fourth BEs. > Query: > INSERT INTO tpcds.store_returns > SELECT NULL, * > FROM HDFS( > 'uri' = 'hdfs:// > > lang33.ca.sbrf.ru:8020/tmp/tpcds_1000/store_returns/sr_returned_date_sk=__HIVE_DEFAULT_PARTITION__/* > ', > 'fs.defaultFS' = 'hdfs://lang33.ca.sbrf.ru:8020', > 'hadoop.username' = 'hdfs', > 'format' = 'parquet' > ); > > Importing/querying Apache Hive utilizes *only 5 instances on four BEs*: > 2 on first BE, > 1 on second, third and fourth BEs. > Query: > INSERT INTO tpcds.store_returns > SELECT > sr_returned_date_sk, > sr_customer_sk, > sr_store_sk, > sr_return_time_sk, > sr_item_sk, > sr_cdemo_sk, > sr_hdemo_sk, > sr_addr_sk, > sr_reason_sk, > sr_ticket_number, > sr_return_quantity, > sr_return_amt, > sr_return_tax, > sr_return_amt_inc_tax, > sr_fee, > sr_return_ship_cost, > sr_refunded_cash, > sr_reversed_charge, > sr_store_credit, > sr_net_loss > FROM tpcds_10000.store_returns; > > How to make Apache Doris utilize more instances for importing/querying > external Hive tables? Kindly ask for your advice about parameters/settings > values. > > Kind regards, > Nikita >