ZhongLinLeo opened a new issue, #9953: URL: https://github.com/apache/seatunnel/issues/9953
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened 在文件同步的 connectors 中,有个查询文件的方法org.apache.seatunnel.connectors.seatunnel.file.source.reader.AbstractReadStrategy.getFileNamesByPath , FileStatus[] stats = hadoopFileSystemProxy.listStatus(path); 有两个问题 1. 目录中有大批量文件时, 会用到 getWorkingDirectory 方法, 这个方法,每次都会创建一个connect ,然后 获取 homeDir, 是否可以使用类变量的方式,毕竟 根目录不会改变,profiler 分析如下 <!-- Failed to upload "image.png" --> <!-- Failed to upload "image.png" --> 测试的同步作业有 4000 个文件, 记录时间内,92 的开销都在这里。 2. 文件过滤,如果有 file_filter_pattern 配置,应该将这个过滤前置到最开始的位置,避免更大的内存开销。 ### SeaTunnel Version 2.3.11 ### SeaTunnel Config ```conf ignore ``` ### Running Command ```shell ignore ``` ### Error Exception ```log none ``` ### Zeta or Flink or Spark Version _No response_ ### Java or Scala Version _No response_ ### Screenshots _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
