[
https://issues.apache.org/jira/browse/FLINK-33508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796058#comment-17796058
]
Anika Kelhanka commented on FLINK-33508:
----------------------------------------
Approach to update Flink's History Server logic to enable getting logs and data
from multiple directories at a time by using a path with wildcards (i.e glob
pattern) from HadoopFileSystem locations:
1. Flink's {{HistoryServerArchiveFetcher}} class currently uses the
HadoopFileSystem's {{listStatus}} API method which not resolve
patterns/wildcards in the history server file path.
2. Introduce a new method {{globStatus(Path pathPattern)}} in {{Flink's
FileSystem}} API.
3. Implement new Method in Flink's HadoopFileSystem class such that it
internally calls the [Hadoop's globStatus
func|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2217].
4. Finally, point {{HistoryServerArchiveFetcher}} to the new globStatus() API
instead of listStatus() for HadoopFileSystem.
> Support for wildcard paths in Flink History Server for multi cluster
> environment
> --------------------------------------------------------------------------------
>
> Key: FLINK-33508
> URL: https://issues.apache.org/jira/browse/FLINK-33508
> Project: Flink
> Issue Type: Improvement
> Reporter: Jayadeep Jayaraman
> Assignee: Jayadeep Jayaraman
> Priority: Major
> Labels: pull-request-available
>
> In Cloud users typically create multiple clusters which are ephemeral and
> want a single history server to look at historical jobs.
> To implement this history server needs to support wildcard paths and this
> change is to support such wildcard paths
--
This message was sent by Atlassian Jira
(v8.20.10#820010)