yangzhiyuss commented on code in PR #5428:
URL: https://github.com/apache/seatunnel/pull/5428#discussion_r1316574239
##########
seatunnel-connectors-v2/connector-file/connector-file-base-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/BaseHdfsFileSource.java:
##########
@@ -110,13 +112,26 @@ public void prepare(Config pluginConfig) throws
PrepareFailException {
"SeaTunnel does not supported this file format");
}
} else {
- try {
- rowType = readStrategy.getSeaTunnelRowTypeInfo(hadoopConf,
filePaths.get(0));
- } catch (FileConnectorException e) {
+ FileConnectorException fileConnectorException = null;
Review Comment:
Sometimes during data migration, some such files will be generated due to
the network or hadoop system itself, but hadoop itself will not take the
initiative to clean up。
for example:


When a dirty or empty file appears, the hdfsfile source will fail to get the
rowtype, because the original code only parses the first file, which may be an
empty temporary file or a dirty file.However, these files have no impact on
hive, and hive can still query them

After modifying the code, these files can be filtered out, and can be pulled
successfully, and the data is not lost

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]