[
https://issues.apache.org/jira/browse/FLINK-31975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042685#comment-18042685
]
huyuliang edited comment on FLINK-31975 at 12/4/25 2:36 AM:
------------------------------------------------------------
When using filesystem source, the path directory will be traversed to find the
partition structure. The specific logic is in the method listStatusRecursively
in PartitionPathUtils. This method does not process hidden files, so when the
path directory contains incompletely written partitions (the directory does not
have a partition structure), it will be thrown in the
FileSystemTableSource.toFullLinkedPartSpec method.
"Partition keys are: "
+partitionKeys
+ ", incomplete partition spec: "
+ part
was (Author: JIRAUSER309438):
当使用filesystem
source时,会遍历path目录以寻找分区结构,具体逻辑在PartitionPathUtils中的方法listStatusRecursively,该方法没有对隐藏文件进行处理,所以在path目录包含未完全写入的分区时(目录不具有分区结构),会在FileSystemTableSource.toFullLinkedPartSpec方法抛出
```java
for (String k : partitionKeys) {
if (!part.containsKey(k))
{ throw new TableException( "Partition keys are: " + partitionKeys + ",
incomplete partition spec: " + part); }
map.put(k, part.get(k));
}
```
> default catalog failed to retrieve partition Spec
> -------------------------------------------------
>
> Key: FLINK-31975
> URL: https://issues.apache.org/jira/browse/FLINK-31975
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Client
> Affects Versions: 1.16.0
> Reporter: Samrat Deb
> Priority: Major
>
> Here is the attached Repro for the error .
> - Flink 1.16.0 cluster
>
>
> {code:java}
> Flink SQL> show current catalog
> > ;
> +----------------------+
> | current catalog name |
> +----------------------+
> | default_catalog |
> +----------------------+
> 1 row in set
> Flink SQL> show tables;
> +-------------------+
> | table name |
> +-------------------+
> | country_page_view |
> | page_view_source |
> | part_table |
> +-------------------+
> 3 rows in set
> Flink SQL> drop table page_view_source;
> [INFO] Execute statement succeed.
> Flink SQL> drop table country_page_view;
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE page_view_source (`user` STRING, `cnt` INT, `date`
> STRING, `country` STRING)
> > WITH (
> > 'connector' = 'datagen', 'number-of-rows' = '10'
> > );
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE country_page_view (`user` STRING, `cnt` INT, `date`
> STRING, `country` STRING)
> > PARTITIONED BY (`date`, `country`)
> > WITH (
> >
> > 'format' = 'csv',
> > 'path' =
> > 's3://dbsamrat-emr-dev/glue-catalog/dbsamrat/country_page_view/',
> > 'connector' = 'filesystem'
> > )
> > ;
> [INFO] Execute statement succeed.
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30',
> `country`='China')
> > SELECT `user`, `cnt` FROM page_view_source;
> >
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:36,133 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:36,134 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:36,135 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:36,135 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:36,149 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 7c39db71be1f1b52e13a72831fed8105
> Flink SQL> EXECUTE INSERT INTO country_page_view PARTITION
> (`date`='2019-8-30', `country`='China')
> > SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:41,424 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:41,424 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:41,424 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:41,424 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:41,427 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 69e18cb23f505528948a6398390ad070
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30')
> > SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:47,509 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:47,509 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:47,509 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:47,510 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:47,512 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: dc82613e0f2f8a2bafc61dcd35486f4e
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30',
> `country`='China')
> > SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:53,534 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:53,534 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:53,535 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:53,535 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:53,542 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 117900654da5a89ce517d85383d4fe4a
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30')
> > SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:58,834 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:58,834 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:58,834 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:58,835 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:58,838 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: ca63640e867b9309b8c69d4dba7d94b1
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30',
> `country`='China') (`user`)
> > SELECT user FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:52:04,467 INFO
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] -
> Connecting to ResourceManager at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:52:04,469 INFO org.apache.hadoop.yarn.client.AHSProxy
> [] - Connecting to Application History server at
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:52:04,470 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - No path for the flink jar passed. Using the location of
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:52:04,470 WARN org.apache.flink.yarn.YarnClusterDescriptor
> [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR
> environment variable is set.The Flink YARN Client needs one of these to be
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:52:04,474 INFO org.apache.flink.yarn.YarnClusterDescriptor
> [] - Found Web Interface
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 8bca09468a1193f47500ab3eadf04375
> {code}
>
> Finally while selecting rows from the table , it throws the following error
> {code:java}
> Flink SQL> select * from country_page_view;
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.TableException: Partition keys are: [date,
> country], incomplete partition spec: {}
> Flink SQL>
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)