[I] [Bug] [connectors-sftp] 在使用SFTP同步时，如果目录中文件比较多，会出现性能问题。 [seatunnel]

via GitHub Sat, 18 Oct 2025 05:28:42 -0700


ZhongLinLeo opened a new issue, #9953:
URL: https://github.com/apache/seatunnel/issues/9953


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   在文件同步的 connectors 
中，有个查询文件的方法org.apache.seatunnel.connectors.seatunnel.file.source.reader.AbstractReadStrategy.getFileNamesByPath
 , 
   FileStatus[] stats = hadoopFileSystemProxy.listStatus(path); 
   有两个问题
   1. 目录中有大批量文件时， 会用到 getWorkingDirectory 方法， 这个方法，每次都会创建一个connect ，然后 获取 
homeDir， 是否可以使用类变量的方式，毕竟 根目录不会改变，profiler 分析如下
   
   <!-- Failed to upload "image.png" -->
   
   <!-- Failed to upload "image.png" -->
   测试的同步作业有 4000 个文件， 记录时间内，92 的开销都在这里。
   2. 文件过滤，如果有 file_filter_pattern 配置，应该将这个过滤前置到最开始的位置，避免更大的内存开销。
   
   
   ### SeaTunnel Version
   
   2.3.11
   
   ### SeaTunnel Config
   
   ```conf
   ignore
   ```
   
   ### Running Command
   
   ```shell
   ignore
   ```
   
   ### Error Exception
   
   ```log
   none
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] [connectors-sftp] 在使用SFTP同步时，如果目录中文件比较多，会出现性能问题。 [seatunnel]

Reply via email to