[ 
https://issues.apache.org/jira/browse/FLINK-31975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042685#comment-18042685
 ] 

huyuliang edited comment on FLINK-31975 at 12/4/25 2:36 AM:
------------------------------------------------------------

When using filesystem source, the path directory will be traversed to find the 
partition structure. The specific logic is in the method listStatusRecursively 
in PartitionPathUtils. This method does not process hidden files, so when the 
path directory contains incompletely written partitions (the directory does not 
have a partition structure), it will be thrown in the 
FileSystemTableSource.toFullLinkedPartSpec method.
"Partition keys are: "
+partitionKeys
+ ", incomplete partition spec: "
+ part


was (Author: JIRAUSER309438):
当使用filesystem 
source时,会遍历path目录以寻找分区结构,具体逻辑在PartitionPathUtils中的方法listStatusRecursively,该方法没有对隐藏文件进行处理,所以在path目录包含未完全写入的分区时(目录不具有分区结构),会在FileSystemTableSource.toFullLinkedPartSpec方法抛出
```java
for (String k : partitionKeys) {
if (!part.containsKey(k))

{ throw new TableException( "Partition keys are: " + partitionKeys + ", 
incomplete partition spec: " + part); }

map.put(k, part.get(k));
}
```

> default catalog failed to retrieve partition Spec
> -------------------------------------------------
>
>                 Key: FLINK-31975
>                 URL: https://issues.apache.org/jira/browse/FLINK-31975
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Client
>    Affects Versions: 1.16.0
>            Reporter: Samrat Deb
>            Priority: Major
>
> Here is the attached Repro for the error . 
> -  Flink 1.16.0 cluster 
>  
>  
> {code:java}
> Flink SQL> show current catalog
> > ;
> +----------------------+
> | current catalog name |
> +----------------------+
> |      default_catalog |
> +----------------------+
> 1 row in set
> Flink SQL> show tables;
> +-------------------+
> |        table name |
> +-------------------+
> | country_page_view |
> |  page_view_source |
> |        part_table |
> +-------------------+
> 3 rows in set
> Flink SQL> drop table page_view_source;
> [INFO] Execute statement succeed.
> Flink SQL> drop table country_page_view;
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE  page_view_source (`user` STRING, `cnt` INT, `date` 
> STRING, `country` STRING)
> > WITH (
> >   'connector' = 'datagen',  'number-of-rows' = '10'
> > );
> [INFO] Execute statement succeed.
> Flink SQL> CREATE TABLE country_page_view (`user` STRING, `cnt` INT, `date` 
> STRING, `country` STRING)
> > PARTITIONED BY (`date`, `country`)
> > WITH (
> >
> >    'format' = 'csv',
> >    'path' = 
> > 's3://dbsamrat-emr-dev/glue-catalog/dbsamrat/country_page_view/',
> >    'connector' = 'filesystem'
> > )
> > ;
> [INFO] Execute statement succeed.
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30', 
> `country`='China')
> >   SELECT `user`, `cnt` FROM page_view_source;
> >
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:36,133 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:36,134 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:36,135 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:36,135 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:36,149 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 7c39db71be1f1b52e13a72831fed8105
> Flink SQL> EXECUTE INSERT INTO country_page_view PARTITION 
> (`date`='2019-8-30', `country`='China')
> >   SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:41,424 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:41,424 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:41,424 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:41,424 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:41,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 69e18cb23f505528948a6398390ad070
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30')
> >   SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:47,509 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:47,509 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:47,509 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:47,510 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:47,512 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: dc82613e0f2f8a2bafc61dcd35486f4e
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30', 
> `country`='China')
> >   SELECT `user`, `cnt` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:53,534 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:53,534 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:53,535 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:53,535 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:53,542 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 117900654da5a89ce517d85383d4fe4a
> Flink SQL> INSERT OVERWRITE country_page_view PARTITION (`date`='2019-8-30')
> >   SELECT `user`, `cnt`, `country` FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:51:58,834 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:51:58,834 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:51:58,834 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:51:58,835 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:51:58,838 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: ca63640e867b9309b8c69d4dba7d94b1
> Flink SQL> INSERT INTO country_page_view PARTITION (`date`='2019-8-30', 
> `country`='China') (`user`)
> >   SELECT user FROM page_view_source;
> [INFO] Submitting SQL update statement to the cluster...
> 2023-04-29 09:52:04,467 INFO  
> org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider [] - 
> Connecting to ResourceManager at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:8032
> 2023-04-29 09:52:04,469 INFO  org.apache.hadoop.yarn.client.AHSProxy          
>              [] - Connecting to Application History server at 
> ip-172-31-38-72.us-west-2.compute.internal/172.31.38.72:10200
> 2023-04-29 09:52:04,470 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - No path for the flink jar passed. Using the location of 
> class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2023-04-29 09:52:04,470 WARN  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR 
> environment variable is set.The Flink YARN Client needs one of these to be 
> set to properly load the Hadoop configuration for accessing YARN.
> 2023-04-29 09:52:04,474 INFO  org.apache.flink.yarn.YarnClusterDescriptor     
>              [] - Found Web Interface 
> ip-172-31-39-51.us-west-2.compute.internal:36583 of application 
> 'application_1682266531513_0004'.
> [INFO] SQL update statement has been successfully submitted to the cluster:
> Job ID: 8bca09468a1193f47500ab3eadf04375
> {code}
>  
> Finally while selecting rows from the table , it throws the following error 
> {code:java}
> Flink SQL> select * from country_page_view;
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.TableException: Partition keys are: [date, 
> country], incomplete partition spec: {}
> Flink SQL>
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to