[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819475#comment-17819475 ]
yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:03 AM: -------------------------------------------------------------- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. Also, I noticed in your previous test that one of the tables is not an ICEBERG table. ( testdb.test_data_02 ) This SQL executes normal internal tables without problems. It is the ICEBERG table that has problems executing. {code:java} 0: jdbc:hive2://smaster01:10000> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:10000> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select......a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 20.10 s ---------------------------------------------------------------------------------------------- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : ---------------------------------------------------------------------------------------------- INFO : OPERATION DURATION INFO : ---------------------------------------------------------------------------------------------- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : ---------------------------------------------------------------------------------------------- INFO : INFO : Task Execution Summary INFO : ---------------------------------------------------------------------------------------------- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : ---------------------------------------------------------------------------------------------- INFO : Map 1 0.00 10,290 290 2 2 INFO : Reducer 2 0.00 1,020 0 2 0 INFO : ---------------------------------------------------------------------------------------------- INFO : INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_SUCCEEDED_TASKS: 2 INFO : TOTAL_LAUNCHED_TASKS: 2 INFO : RACK_LOCAL_TASKS: 1 INFO : AM_CPU_MILLISECONDS: 5430 INFO : WALL_CLOCK_MILLIS: 9971 INFO : AM_GC_TIME_MILLIS: 51 INFO : INITIAL_HELD_CONTAINERS: 0 INFO : TOTAL_CONTAINERS_USED: 1 INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 2 INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1 INFO : TOTAL_CONTAINER_REUSE_COUNT: 1 INFO : File System Counters: INFO : FILE_BYTES_READ: 0 INFO : FILE_BYTES_WRITTEN: 0 INFO : FILE_READ_OPS: 0 INFO : FILE_LARGE_READ_OPS: 0 INFO : FILE_WRITE_OPS: 0 INFO : HDFS_BYTES_READ: 1210 INFO : HDFS_BYTES_WRITTEN: 121 INFO : HDFS_READ_OPS: 8 INFO : HDFS_LARGE_READ_OPS: 0 INFO : HDFS_WRITE_OPS: 2 INFO : org.apache.tez.common.counters.TaskCounter: INFO : SPILLED_RECORDS: 0 INFO : NUM_SHUFFLED_INPUTS: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : GC_TIME_MILLIS: 290 INFO : CPU_MILLISECONDS: 11310 INFO : WALL_CLOCK_MILLISECONDS: 9561 INFO : PHYSICAL_MEMORY_BYTES: 3435134976 INFO : VIRTUAL_MEMORY_BYTES: 7557877760 INFO : COMMITTED_HEAP_BYTES: 3435134976 INFO : INPUT_RECORDS_PROCESSED: 4 INFO : INPUT_SPLIT_LENGTH_BYTES: 880 INFO : OUTPUT_RECORDS: 2 INFO : APPROXIMATE_INPUT_RECORDS: 2 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_BYTES: 8 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18 INFO : OUTPUT_BYTES_PHYSICAL: 46 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : SHUFFLE_BYTES: 0 INFO : SHUFFLE_BYTES_DECOMPRESSED: 0 INFO : SHUFFLE_BYTES_TO_MEM: 0 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_DISK_DIRECT: 0 INFO : SHUFFLE_PHASE_TIME: 158 INFO : FIRST_EVENT_RECEIVED: 157 INFO : LAST_EVENT_RECEIVED: 157 INFO : DATA_BYTES_VIA_EVENT: 22 INFO : HIVE: INFO : CREATED_FILES: 1 INFO : DESERIALIZE_ERRORS: 0 INFO : RECORDS_IN_Map_1: 2 INFO : RECORDS_OUT_0: 2 INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0 INFO : RECORDS_OUT_OPERATOR_FS_13: 2 INFO : RECORDS_OUT_OPERATOR_LIM_11: 2 INFO : RECORDS_OUT_OPERATOR_LIM_8: 2 INFO : RECORDS_OUT_OPERATOR_MAP_0: 0 INFO : RECORDS_OUT_OPERATOR_RS_10: 2 INFO : RECORDS_OUT_OPERATOR_SEL_12: 2 INFO : RECORDS_OUT_OPERATOR_SEL_9: 2 INFO : RECORDS_OUT_OPERATOR_TS_0: 2 INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters: INFO : GROUPED_INPUT_SPLITS_Map_1: 1 INFO : INPUT_DIRECTORIES_Map_1: 1 INFO : INPUT_FILES_Map_1: 1 INFO : RAW_INPUT_SPLITS_Map_1: 1 INFO : Completed executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 37.363 seconds s1.id NULL s1.name a s1.id NULL s1.name b 2 rows selected (38.804 seconds) {code} What I want to know is why the same SQL produces different execution plans? Do I need to provide you with my HIVE related configuration? was (Author: lisoda): [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. Also, I noticed in your previous test that one of the tables is not an ICEBERG table. This SQL executes normal internal tables without problems. It is the ICEBERG table that has problems executing. {code:java} 0: jdbc:hive2://smaster01:10000> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:10000> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select......a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 20.10 s ---------------------------------------------------------------------------------------------- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : ---------------------------------------------------------------------------------------------- INFO : OPERATION DURATION INFO : ---------------------------------------------------------------------------------------------- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : ---------------------------------------------------------------------------------------------- INFO : INFO : Task Execution Summary INFO : ---------------------------------------------------------------------------------------------- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : ---------------------------------------------------------------------------------------------- INFO : Map 1 0.00 10,290 290 2 2 INFO : Reducer 2 0.00 1,020 0 2 0 INFO : ---------------------------------------------------------------------------------------------- INFO : INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_SUCCEEDED_TASKS: 2 INFO : TOTAL_LAUNCHED_TASKS: 2 INFO : RACK_LOCAL_TASKS: 1 INFO : AM_CPU_MILLISECONDS: 5430 INFO : WALL_CLOCK_MILLIS: 9971 INFO : AM_GC_TIME_MILLIS: 51 INFO : INITIAL_HELD_CONTAINERS: 0 INFO : TOTAL_CONTAINERS_USED: 1 INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 2 INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1 INFO : TOTAL_CONTAINER_REUSE_COUNT: 1 INFO : File System Counters: INFO : FILE_BYTES_READ: 0 INFO : FILE_BYTES_WRITTEN: 0 INFO : FILE_READ_OPS: 0 INFO : FILE_LARGE_READ_OPS: 0 INFO : FILE_WRITE_OPS: 0 INFO : HDFS_BYTES_READ: 1210 INFO : HDFS_BYTES_WRITTEN: 121 INFO : HDFS_READ_OPS: 8 INFO : HDFS_LARGE_READ_OPS: 0 INFO : HDFS_WRITE_OPS: 2 INFO : org.apache.tez.common.counters.TaskCounter: INFO : SPILLED_RECORDS: 0 INFO : NUM_SHUFFLED_INPUTS: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : GC_TIME_MILLIS: 290 INFO : CPU_MILLISECONDS: 11310 INFO : WALL_CLOCK_MILLISECONDS: 9561 INFO : PHYSICAL_MEMORY_BYTES: 3435134976 INFO : VIRTUAL_MEMORY_BYTES: 7557877760 INFO : COMMITTED_HEAP_BYTES: 3435134976 INFO : INPUT_RECORDS_PROCESSED: 4 INFO : INPUT_SPLIT_LENGTH_BYTES: 880 INFO : OUTPUT_RECORDS: 2 INFO : APPROXIMATE_INPUT_RECORDS: 2 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_BYTES: 8 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18 INFO : OUTPUT_BYTES_PHYSICAL: 46 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : SHUFFLE_BYTES: 0 INFO : SHUFFLE_BYTES_DECOMPRESSED: 0 INFO : SHUFFLE_BYTES_TO_MEM: 0 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_DISK_DIRECT: 0 INFO : SHUFFLE_PHASE_TIME: 158 INFO : FIRST_EVENT_RECEIVED: 157 INFO : LAST_EVENT_RECEIVED: 157 INFO : DATA_BYTES_VIA_EVENT: 22 INFO : HIVE: INFO : CREATED_FILES: 1 INFO : DESERIALIZE_ERRORS: 0 INFO : RECORDS_IN_Map_1: 2 INFO : RECORDS_OUT_0: 2 INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0 INFO : RECORDS_OUT_OPERATOR_FS_13: 2 INFO : RECORDS_OUT_OPERATOR_LIM_11: 2 INFO : RECORDS_OUT_OPERATOR_LIM_8: 2 INFO : RECORDS_OUT_OPERATOR_MAP_0: 0 INFO : RECORDS_OUT_OPERATOR_RS_10: 2 INFO : RECORDS_OUT_OPERATOR_SEL_12: 2 INFO : RECORDS_OUT_OPERATOR_SEL_9: 2 INFO : RECORDS_OUT_OPERATOR_TS_0: 2 INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters: INFO : GROUPED_INPUT_SPLITS_Map_1: 1 INFO : INPUT_DIRECTORIES_Map_1: 1 INFO : INPUT_FILES_Map_1: 1 INFO : RAW_INPUT_SPLITS_Map_1: 1 INFO : Completed executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 37.363 seconds s1.id NULL s1.name a s1.id NULL s1.name b 2 rows selected (38.804 seconds) {code} What I want to know is why the same SQL produces different execution plans? Do I need to provide you with my HIVE related configuration? > HIVE4 can't use ICEBERG table in subqueries > ------------------------------------------- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration > Affects Versions: 4.0.0-beta-1 > Reporter: yongzhi.shao > Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs://xxxx/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|11111' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|11111' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|11111' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|11111' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|11111' limit 10 > ) t1; --10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)