[
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819475#comment-17819475
]
yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:49 AM:
--------------------------------------------------------------
[~zhangbutao]
I tried the master branch, using the latest updated code today, and the problem
persists.
{code:java}
0: jdbc:hive2://smaster01:10000> select version();
INFO : Compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 1.974 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Completed executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 0.004 seconds
_c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d
0: jdbc:hive2://smaster01:10000> select * from (select * from
iceberg_dwd.test_data_04 limit 10) s1;
INFO : Compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id,
type:string, comment:null), FieldSchema(name:s1.name, type:string,
comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 1.007 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId:
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select * from (select......a_04 limit 10) s1 (Stage-1)
INFO : HS2 Host: [smaster01], Query ID:
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID:
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO : Status: Running (Executing on YARN cluster with App id
application_1706163520799_39421)----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0
0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 20.10 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 19.90 seconds
INFO : DAG ID: dag_1706163520799_39421_1
INFO :
INFO : Query Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO :
----------------------------------------------------------------------------------------------
INFO : Compile Query 1.01s
INFO : Prepare Plan 16.62s
INFO : Get Query Coordinator (AM) 0.03s
INFO : Submit Plan 0.54s
INFO : Start DAG 0.07s
INFO : Run DAG 19.90s
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
INPUT_RECORDS OUTPUT_RECORDS
INFO :
----------------------------------------------------------------------------------------------
INFO : Map 1 0.00 10,290 290
2 2
INFO : Reducer 2 0.00 1,020 0
2 0
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 5430
INFO : WALL_CLOCK_MILLIS: 9971
INFO : AM_GC_TIME_MILLIS: 51
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 1
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 2
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO : TOTAL_CONTAINER_REUSE_COUNT: 1
INFO : File System Counters:
INFO : FILE_BYTES_READ: 0
INFO : FILE_BYTES_WRITTEN: 0
INFO : FILE_READ_OPS: 0
INFO : FILE_LARGE_READ_OPS: 0
INFO : FILE_WRITE_OPS: 0
INFO : HDFS_BYTES_READ: 1210
INFO : HDFS_BYTES_WRITTEN: 121
INFO : HDFS_READ_OPS: 8
INFO : HDFS_LARGE_READ_OPS: 0
INFO : HDFS_WRITE_OPS: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 290
INFO : CPU_MILLISECONDS: 11310
INFO : WALL_CLOCK_MILLISECONDS: 9561
INFO : PHYSICAL_MEMORY_BYTES: 3435134976
INFO : VIRTUAL_MEMORY_BYTES: 7557877760
INFO : COMMITTED_HEAP_BYTES: 3435134976
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 880
INFO : OUTPUT_RECORDS: 2
INFO : APPROXIMATE_INPUT_RECORDS: 2
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 8
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18
INFO : OUTPUT_BYTES_PHYSICAL: 46
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 0
INFO : SHUFFLE_BYTES_DECOMPRESSED: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_PHASE_TIME: 158
INFO : FIRST_EVENT_RECEIVED: 157
INFO : LAST_EVENT_RECEIVED: 157
INFO : DATA_BYTES_VIA_EVENT: 22
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 2
INFO : RECORDS_OUT_0: 2
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_13: 2
INFO : RECORDS_OUT_OPERATOR_LIM_11: 2
INFO : RECORDS_OUT_OPERATOR_LIM_8: 2
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_10: 2
INFO : RECORDS_OUT_OPERATOR_SEL_12: 2
INFO : RECORDS_OUT_OPERATOR_SEL_9: 2
INFO : RECORDS_OUT_OPERATOR_TS_0: 2
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 1
INFO : RAW_INPUT_SPLITS_Map_1: 1
INFO : Completed executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 37.363 seconds
s1.id NULL
s1.name a
s1.id NULL
s1.name b
2 rows selected (38.804 seconds) {code}
我想要知道的是,为什么同样的SQL,产生的执行计划是不同的?我需要向您提供我这边HIVE相关的配置项么?
was (Author: lisoda):
[~zhangbutao]
I tried the master branch, using the latest updated code today, and the problem
persists.
{code:java}
0: jdbc:hive2://smaster01:10000> select version();
INFO : Compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 1.974 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13):
select version()
INFO : Completed executing
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time
taken: 0.004 seconds
_c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d
0: jdbc:hive2://smaster01:10000> select * from (select * from
iceberg_dwd.test_data_04 limit 10) s1;
INFO : Compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id,
type:string, comment:null), FieldSchema(name:s1.name, type:string,
comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 1.007 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6):
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId:
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select * from (select......a_04 limit 10) s1 (Stage-1)
INFO : HS2 Host: [smaster01], Query ID:
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID:
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO : Status: Running (Executing on YARN cluster with App id
application_1706163520799_39421)----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0
0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 20.10 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 19.90 seconds
INFO : DAG ID: dag_1706163520799_39421_1
INFO :
INFO : Query Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO :
----------------------------------------------------------------------------------------------
INFO : Compile Query 1.01s
INFO : Prepare Plan 16.62s
INFO : Get Query Coordinator (AM) 0.03s
INFO : Submit Plan 0.54s
INFO : Start DAG 0.07s
INFO : Run DAG 19.90s
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
INPUT_RECORDS OUTPUT_RECORDS
INFO :
----------------------------------------------------------------------------------------------
INFO : Map 1 0.00 10,290 290
2 2
INFO : Reducer 2 0.00 1,020 0
2 0
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 2
INFO : TOTAL_LAUNCHED_TASKS: 2
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 5430
INFO : WALL_CLOCK_MILLIS: 9971
INFO : AM_GC_TIME_MILLIS: 51
INFO : INITIAL_HELD_CONTAINERS: 0
INFO : TOTAL_CONTAINERS_USED: 1
INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 2
INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO : TOTAL_CONTAINER_REUSE_COUNT: 1
INFO : File System Counters:
INFO : FILE_BYTES_READ: 0
INFO : FILE_BYTES_WRITTEN: 0
INFO : FILE_READ_OPS: 0
INFO : FILE_LARGE_READ_OPS: 0
INFO : FILE_WRITE_OPS: 0
INFO : HDFS_BYTES_READ: 1210
INFO : HDFS_BYTES_WRITTEN: 121
INFO : HDFS_READ_OPS: 8
INFO : HDFS_LARGE_READ_OPS: 0
INFO : HDFS_WRITE_OPS: 2
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : SPILLED_RECORDS: 0
INFO : NUM_SHUFFLED_INPUTS: 0
INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
INFO : GC_TIME_MILLIS: 290
INFO : CPU_MILLISECONDS: 11310
INFO : WALL_CLOCK_MILLISECONDS: 9561
INFO : PHYSICAL_MEMORY_BYTES: 3435134976
INFO : VIRTUAL_MEMORY_BYTES: 7557877760
INFO : COMMITTED_HEAP_BYTES: 3435134976
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 880
INFO : OUTPUT_RECORDS: 2
INFO : APPROXIMATE_INPUT_RECORDS: 2
INFO : OUTPUT_LARGE_RECORDS: 0
INFO : OUTPUT_BYTES: 8
INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18
INFO : OUTPUT_BYTES_PHYSICAL: 46
INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
INFO : ADDITIONAL_SPILL_COUNT: 0
INFO : SHUFFLE_BYTES: 0
INFO : SHUFFLE_BYTES_DECOMPRESSED: 0
INFO : SHUFFLE_BYTES_TO_MEM: 0
INFO : SHUFFLE_BYTES_TO_DISK: 0
INFO : SHUFFLE_BYTES_DISK_DIRECT: 0
INFO : SHUFFLE_PHASE_TIME: 158
INFO : FIRST_EVENT_RECEIVED: 157
INFO : LAST_EVENT_RECEIVED: 157
INFO : DATA_BYTES_VIA_EVENT: 22
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 2
INFO : RECORDS_OUT_0: 2
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2
INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO : RECORDS_OUT_OPERATOR_FS_13: 2
INFO : RECORDS_OUT_OPERATOR_LIM_11: 2
INFO : RECORDS_OUT_OPERATOR_LIM_8: 2
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_RS_10: 2
INFO : RECORDS_OUT_OPERATOR_SEL_12: 2
INFO : RECORDS_OUT_OPERATOR_SEL_9: 2
INFO : RECORDS_OUT_OPERATOR_TS_0: 2
INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO : GROUPED_INPUT_SPLITS_Map_1: 1
INFO : INPUT_DIRECTORIES_Map_1: 1
INFO : INPUT_FILES_Map_1: 1
INFO : RAW_INPUT_SPLITS_Map_1: 1
INFO : Completed executing
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time
taken: 37.363 seconds
s1.id NULL
s1.name a
s1.id NULL
s1.name b
2 rows selected (38.804 seconds) {code}
> HIVE4 can't use ICEBERG table in subqueries
> -------------------------------------------
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
> Issue Type: Improvement
> Components: Iceberg integration
> Affects Versions: 4.0.0-beta-1
> Reporter: yongzhi.shao
> Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
> uni_order_id STRING,
> data_from BIGINT,
> partner STRING,
> plat_code STRING,
> order_id STRING,
> uni_shop_id STRING,
> uni_id STRING,
> guide_id STRING,
> shop_id STRING,
> plat_account STRING,
> total_fee DOUBLE,
> item_discount_fee DOUBLE,
> trade_discount_fee DOUBLE,
> adjust_fee DOUBLE,
> post_fee DOUBLE,
> discount_rate DOUBLE,
> payment_no_postfee DOUBLE,
> payment DOUBLE,
> pay_time STRING,
> product_num BIGINT,
> order_status STRING,
> is_refund STRING,
> refund_fee DOUBLE,
> insert_time STRING,
> created STRING,
> endtime STRING,
> modified STRING,
> trade_type STRING,
> receiver_name STRING,
> receiver_country STRING,
> receiver_state STRING,
> receiver_city STRING,
> receiver_district STRING,
> receiver_town STRING,
> receiver_address STRING,
> receiver_mobile STRING,
> trade_source STRING,
> delivery_type STRING,
> consign_time STRING,
> orders_num BIGINT,
> is_presale BIGINT,
> presale_status STRING,
> first_fee_paytime STRING,
> last_fee_paytime STRING,
> first_paid_fee DOUBLE,
> tenant STRING,
> tidb_modified STRING,
> step_paid_fee DOUBLE,
> seller_flag STRING,
> is_used_store_card BIGINT,
> store_card_used DOUBLE,
> store_card_basic_used DOUBLE,
> store_card_expand_used DOUBLE,
> order_promotion_num BIGINT,
> item_promotion_num BIGINT,
> buyer_remark STRING,
> seller_remark STRING,
> trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
> 'current-snapshot-id' = '7217819472703702905',
> 'format' = 'iceberg/orc',
> 'format-version' = '1',
> 'hive.stored-as' = 'iceberg',
> 'read.orc.vectorization.enabled' = 'true',
> 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
> 'write.distribution-mode' = 'hash',
> 'write.format.default' = 'orc',
> 'write.metadata.delete-after-commit.enabled' = 'true',
> 'write.metadata.previous-versions-max' = '3',
> 'write.orc.bloom.filter.columns' = 'order_id',
> 'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
> CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> LOCATION 'hdfs://xxxx/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10 --10 rows
> select *
> from (
> select * from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --10 rows
> select uni_shop_id
> from (
> select * from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --0 rows
> select uni_shop_id
> from (
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --0 rows
> --hive-orc
> select uni_shop_id
> from (
> select * from iceberg_dwd.trade_test
> where uni_shop_id = 'TEST|11111' limit 10
> ) t1; --10 ROWS{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)