[
https://issues.apache.org/jira/browse/FLINK-32596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817533#comment-17817533
]
Vallari Rastogi edited comment on FLINK-32596 at 2/15/24 11:16 AM:
-------------------------------------------------------------------
[~luoyuxia]
Hive Metastore expects the partitioned column should be last while inserting
data. [hiveql - Hive partition column - Stack
Overflow|https://stackoverflow.com/questions/60510174/hive-partition-column]
So what flink does is, it uses the last 'n' columns as PartitionColumns
irrespective of what user passes as partition Columns:
[https://github.com/apache/flink/blob/403694e7b9c213386f3ed9cff21ce2664030ebc2/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java#L515]
And select , insert Queries follows the same logic of finding partition columns
at the last!
As a test, I made a chng here.
[https://github.com/apache/flink/commit/df47ceaba82a3d4f3392c1b53bb52f34d520cc3d]
Results:
!image-2024-02-15-03-05-22-541.png|width=600,height=106!
!image-2024-02-15-03-06-28-175.png|width=468,height=267!
The partitions will always come at the last due to HMS. Either we use insert
stmt like:
_INSERT INTO testHive2 PARTITION (ts='22:16:46', active='TRUE') SELECT 1, 46,
'false';_
_SELECT query output:_
_!image-2024-02-15-03-08-50-029.png|width=525,height=328!_
was (Author: JIRAUSER300308):
[~luoyuxia]
Hive Metastore expects the partitioned column should be last while inserting
data. [hiveql - Hive partition column - Stack
Overflow|https://stackoverflow.com/questions/60510174/hive-partition-column]
So what flink does is, it uses the last 'n' columns as PartitionColumns:
[https://github.com/apache/flink/blob/403694e7b9c213386f3ed9cff21ce2664030ebc2/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java#L515]
And select , insert Queries follows the same logic of finding partition columns
at the last!
As a test, I made a chng here.
[https://github.com/apache/flink/commit/df47ceaba82a3d4f3392c1b53bb52f34d520cc3d]
Results:
!image-2024-02-15-03-05-22-541.png|width=600,height=106!
!image-2024-02-15-03-06-28-175.png|width=468,height=267!
The partitions will always come at the last due to HMS. Either we use insert
stmt like:
_INSERT INTO testHive2 PARTITION (ts='22:16:46', active='TRUE') SELECT 1, 46,
'false';_
_SELECT query output:_
_!image-2024-02-15-03-08-50-029.png|width=525,height=328!_
> The partition key will be wrong when use Flink dialect to create Hive table
> ---------------------------------------------------------------------------
>
> Key: FLINK-32596
> URL: https://issues.apache.org/jira/browse/FLINK-32596
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Hive
> Affects Versions: 1.15.0, 1.16.0, 1.17.0
> Reporter: luoyuxia
> Assignee: Vallari Rastogi
> Priority: Major
> Attachments: image-2024-02-14-16-06-13-126.png,
> image-2024-02-15-03-05-22-541.png, image-2024-02-15-03-06-28-175.png,
> image-2024-02-15-03-08-50-029.png
>
>
> Can be reproduced by the following SQL:
>
> {code:java}
> tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT);
> tableEnv.executeSql(
> "create table t1(`date` string, `geo_altitude` FLOAT) partitioned by
> (`date`)"
> + " with ('connector' = 'hive',
> 'sink.partition-commit.delay'='1 s',
> 'sink.partition-commit.policy.kind'='metastore,success-file')");
> CatalogTable catalogTable =
> (CatalogTable)
> hiveCatalog.getTable(ObjectPath.fromString("default.t1"));
> // the following assertion will fail
> assertThat(catalogTable.getPartitionKeys().toString()).isEqualTo("[date]");{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)