[
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819475#comment-17819475
]
yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:51 AM:
--------------------------------------------------------------
[~zhangbutao]
I tried the master branch, using the latest updated code today, and the problem
persists.
Please note that in the statistics of Tez, the anomaly in the statistics of the
Submit Plan metric has been fixed. This proves that I am using the latest
version of the code.
{code:java}
0: jdbc:hive2://smaster01:10000> select version();
INFO : Compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 1.974 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Completed executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 0.004 seconds
_c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d
0: jdbc:hive2://smaster01:10000> select * from (select * from
iceberg_dwd.test_data_04 limit 10) s1;
INFO : Compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id,
type:string, comment:null), FieldSchema(name:s1.name, type:string,
comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 1.007 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId:
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select * from (select......a_04 limit 10) s1 (Stage-1)
INFO : HS2 Host: [smaster01], Query ID:
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID:
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO : Status: Running (Executing on YARN cluster with App id
application_1706163520799_39421)----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0
0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 20.10 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 19.90 seconds
INFO : DAG ID: dag_1706163520799_39421_1
INFO :
INFO : Query Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO :
----------------------------------------------------------------------------------------------
INFO : Compile Query 1.01s
INFO : Prepare Plan 16.62s
INFO : Get Query Coordinator (AM) 0.03s
INFO : Submit Plan 0.54s
INFO : Start DAG 0.07s
INFO : Run DAG 19.90s
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
INPUT_RECORDS OUTPUT_RECORDS
INFO :
----------------------------------------------------------------------------------------------
INFO : Map 1 0.00 10,290 290
2 2
INFO : Reducer 2 0.00 1,020 0
2 0
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 5430
INFO : WALL_CLOCK_MILLIS: 9971
INFO : AM_GC_TIME_MILLIS: 51
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 1
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 2
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO : TOTAL_CONTAINER_REUSE_COUNT: 1
INFO : File System Counters:
INFO : FILE_BYTES_READ: 0
INFO : FILE_BYTES_WRITTEN: 0
INFO : FILE_READ_OPS: 0
INFO : FILE_LARGE_READ_OPS: 0
INFO : FILE_WRITE_OPS: 0
INFO : HDFS_BYTES_READ: 1210
INFO : HDFS_BYTES_WRITTEN: 121
INFO : HDFS_READ_OPS: 8
INFO : HDFS_LARGE_READ_OPS: 0
INFO : HDFS_WRITE_OPS: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 290
INFO : CPU_MILLISECONDS: 11310
INFO : WALL_CLOCK_MILLISECONDS: 9561
INFO : PHYSICAL_MEMORY_BYTES: 3435134976
INFO : VIRTUAL_MEMORY_BYTES: 7557877760
INFO : COMMITTED_HEAP_BYTES: 3435134976
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 880
INFO : OUTPUT_RECORDS: 2
INFO : APPROXIMATE_INPUT_RECORDS: 2
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 8
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18
INFO : OUTPUT_BYTES_PHYSICAL: 46
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 0
INFO : SHUFFLE_BYTES_DECOMPRESSED: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_PHASE_TIME: 158
INFO : FIRST_EVENT_RECEIVED: 157
INFO : LAST_EVENT_RECEIVED: 157
INFO : DATA_BYTES_VIA_EVENT: 22
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 2
INFO : RECORDS_OUT_0: 2
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_13: 2
INFO : RECORDS_OUT_OPERATOR_LIM_11: 2
INFO : RECORDS_OUT_OPERATOR_LIM_8: 2
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_10: 2
INFO : RECORDS_OUT_OPERATOR_SEL_12: 2
INFO : RECORDS_OUT_OPERATOR_SEL_9: 2
INFO : RECORDS_OUT_OPERATOR_TS_0: 2
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 1
INFO : RAW_INPUT_SPLITS_Map_1: 1
INFO : Completed executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 37.363 seconds
s1.id NULL
s1.name a
s1.id NULL
s1.name b
2 rows selected (38.804 seconds) {code}
What I want to know is why the same SQL produces different execution plans? Do
I need to provide you with my HIVE related configuration?
was (Author: lisoda):
[~zhangbutao]
I tried the master branch, using the latest updated code today, and the problem
persists.
{code:java}
0: jdbc:hive2://smaster01:10000> select version();
INFO : Compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 1.974 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Completed executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 0.004 seconds
_c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d
0: jdbc:hive2://smaster01:10000> select * from (select * from
iceberg_dwd.test_data_04 limit 10) s1;
INFO : Compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id,
type:string, comment:null), FieldSchema(name:s1.name, type:string,
comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 1.007 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId:
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select * from (select......a_04 limit 10) s1 (Stage-1)
INFO : HS2 Host: [smaster01], Query ID:
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID:
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO : Status: Running (Executing on YARN cluster with App id
application_1706163520799_39421)----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0
0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 20.10 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 19.90 seconds
INFO : DAG ID: dag_1706163520799_39421_1
INFO :
INFO : Query Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO :
----------------------------------------------------------------------------------------------
INFO : Compile Query 1.01s
INFO : Prepare Plan 16.62s
INFO : Get Query Coordinator (AM) 0.03s
INFO : Submit Plan 0.54s
INFO : Start DAG 0.07s
INFO : Run DAG 19.90s
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
INPUT_RECORDS OUTPUT_RECORDS
INFO :
----------------------------------------------------------------------------------------------
INFO : Map 1 0.00 10,290 290
2 2
INFO : Reducer 2 0.00 1,020 0
2 0
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 5430
INFO : WALL_CLOCK_MILLIS: 9971
INFO : AM_GC_TIME_MILLIS: 51
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 1
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 2
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO : TOTAL_CONTAINER_REUSE_COUNT: 1
INFO : File System Counters:
INFO : FILE_BYTES_READ: 0
INFO : FILE_BYTES_WRITTEN: 0
INFO : FILE_READ_OPS: 0
INFO : FILE_LARGE_READ_OPS: 0
INFO : FILE_WRITE_OPS: 0
INFO : HDFS_BYTES_READ: 1210
INFO : HDFS_BYTES_WRITTEN: 121
INFO : HDFS_READ_OPS: 8
INFO : HDFS_LARGE_READ_OPS: 0
INFO : HDFS_WRITE_OPS: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 290
INFO : CPU_MILLISECONDS: 11310
INFO : WALL_CLOCK_MILLISECONDS: 9561
INFO : PHYSICAL_MEMORY_BYTES: 3435134976
INFO : VIRTUAL_MEMORY_BYTES: 7557877760
INFO : COMMITTED_HEAP_BYTES: 3435134976
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 880
INFO : OUTPUT_RECORDS: 2
INFO : APPROXIMATE_INPUT_RECORDS: 2
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 8
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18
INFO : OUTPUT_BYTES_PHYSICAL: 46
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 0
INFO : SHUFFLE_BYTES_DECOMPRESSED: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_PHASE_TIME: 158
INFO : FIRST_EVENT_RECEIVED: 157
INFO : LAST_EVENT_RECEIVED: 157
INFO : DATA_BYTES_VIA_EVENT: 22
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 2
INFO : RECORDS_OUT_0: 2
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_13: 2
INFO : RECORDS_OUT_OPERATOR_LIM_11: 2
INFO : RECORDS_OUT_OPERATOR_LIM_8: 2
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_10: 2
INFO : RECORDS_OUT_OPERATOR_SEL_12: 2
INFO : RECORDS_OUT_OPERATOR_SEL_9: 2
INFO : RECORDS_OUT_OPERATOR_TS_0: 2
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 1
INFO : RAW_INPUT_SPLITS_Map_1: 1
INFO : Completed executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 37.363 seconds
s1.id NULL
s1.name a
s1.id NULL
s1.name b
2 rows selected (38.804 seconds) {code}
What I want to know is why the same SQL produces different execution plans? Do
I need to provide you with my HIVE related configuration?
> HIVE4 can't use ICEBERG table in subqueries
> -------------------------------------------
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
> Issue Type: Improvement
> Components: Iceberg integration
> Affects Versions: 4.0.0-beta-1
> Reporter: yongzhi.shao
> Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
> uni_order_id STRING,
> data_from BIGINT,
> partner STRING,
> plat_code STRING,
> order_id STRING,
> uni_shop_id STRING,
> uni_id STRING,
> guide_id STRING,
> shop_id STRING,
> plat_account STRING,
> total_fee DOUBLE,
> item_discount_fee DOUBLE,
> trade_discount_fee DOUBLE,
> adjust_fee DOUBLE,
> post_fee DOUBLE,
> discount_rate DOUBLE,
> payment_no_postfee DOUBLE,
> payment DOUBLE,
> pay_time STRING,
> product_num BIGINT,
> order_status STRING,
> is_refund STRING,
> refund_fee DOUBLE,
> insert_time STRING,
> created STRING,
> endtime STRING,
> modified STRING,
> trade_type STRING,
> receiver_name STRING,
> receiver_country STRING,
> receiver_state STRING,
> receiver_city STRING,
> receiver_district STRING,
> receiver_town STRING,
> receiver_address STRING,
> receiver_mobile STRING,
> trade_source STRING,
> delivery_type STRING,
> consign_time STRING,
> orders_num BIGINT,
> is_presale BIGINT,
> presale_status STRING,
> first_fee_paytime STRING,
> last_fee_paytime STRING,
> first_paid_fee DOUBLE,
> tenant STRING,
> tidb_modified STRING,
> step_paid_fee DOUBLE,
> seller_flag STRING,
> is_used_store_card BIGINT,
> store_card_used DOUBLE,
> store_card_basic_used DOUBLE,
> store_card_expand_used DOUBLE,
> order_promotion_num BIGINT,
> item_promotion_num BIGINT,
> buyer_remark STRING,
> seller_remark STRING,
> trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
> 'current-snapshot-id' = '7217819472703702905',
> 'format' = 'iceberg/orc',
> 'format-version' = '1',
> 'hive.stored-as' = 'iceberg',
> 'read.orc.vectorization.enabled' = 'true',
> 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
> 'write.distribution-mode' = 'hash',
> 'write.format.default' = 'orc',
> 'write.metadata.delete-after-commit.enabled' = 'true',
> 'write.metadata.previous-versions-max' = '3',
> 'write.orc.bloom.filter.columns' = 'order_id',
> 'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 'hdfs://xxxx/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10 --10 rows
> select *
> from (
> select * from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --10 rows
> select uni_shop_id
> from (
> select * from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --0 rows
> select uni_shop_id
> from (
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --0 rows
> --hive-orc
> select uni_shop_id
> from (
> select * from iceberg_dwd.trade_test
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --10 ROWS{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)