Re: [PR] [Feature] [File Connector] Make File Source Connector support row delimiter. [seatunnel]

via GitHub Mon, 20 Nov 2023 18:05:25 -0800


lightzhao commented on code in PR #5873:
URL: https://github.com/apache/seatunnel/pull/5873#discussion_r1399944673



##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java:
##########
@@ -94,32 +96,21 @@ public void read(String path, String tableId, 
Collector<SeaTunnelRow> output)
                     .forEach(
                             line -> {
                                 try {
-                                    SeaTunnelRow seaTunnelRow =
-                                            
deserializationSchema.deserialize(line.getBytes());
-                                    if (!readColumns.isEmpty()) {
-                                        // need column projection
-                                        Object[] fields;
-                                        if (isMergePartition) {
-                                            fields =
-                                                    new Object
-                                                            [readColumns.size()
-                                                                    + 
partitionsMap.size()];
-                                        } else {
-                                            fields = new 
Object[readColumns.size()];
-                                        }
-                                        for (int i = 0; i < indexes.length; 
i++) {
-                                            fields[i] = 
seaTunnelRow.getField(indexes[i]);
-                                        }
-                                        seaTunnelRow = new 
SeaTunnelRow(fields);
-                                    }
-                                    if (isMergePartition) {
-                                        int index = 
seaTunnelRowType.getTotalFields();
-                                        for (String value : 
partitionsMap.values()) {
-                                            seaTunnelRow.setField(index++, 
value);
+                                    if (StringUtils.isBlank(rowDelimiter)) {
+                                        SeaTunnelRow seaTunnelRow =
+                                                deserializeRow(line, 
partitionsMap, tableId);
+                                        output.collect(seaTunnelRow);
+                                    } else {
+                                        String[] rows = 
line.split(rowDelimiter, -1);

Review Comment:
   If the size of a line in the file is 500M, then the entire line will be read 
into the memory.  reader.lines() may consume more resources when processing 
large files, and reader.readLine() should be more suitable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Feature] [File Connector] Make File Source Connector support row delimiter. [seatunnel]

Reply via email to