Re: [PR] [Feature] [File Connector] Make File Source Connector support row delimiter. [seatunnel]

via GitHub Tue, 28 Nov 2023 22:27:38 -0800


lightzhao commented on code in PR #5873:
URL: https://github.com/apache/seatunnel/pull/5873#discussion_r1408804798



##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java:
##########
@@ -94,32 +96,21 @@ public void read(String path, String tableId, 
Collector<SeaTunnelRow> output)
                     .forEach(
                             line -> {
                                 try {
-                                    SeaTunnelRow seaTunnelRow =
-                                            
deserializationSchema.deserialize(line.getBytes());
-                                    if (!readColumns.isEmpty()) {
-                                        // need column projection
-                                        Object[] fields;
-                                        if (isMergePartition) {
-                                            fields =
-                                                    new Object
-                                                            [readColumns.size()
-                                                                    + 
partitionsMap.size()];
-                                        } else {
-                                            fields = new 
Object[readColumns.size()];
-                                        }
-                                        for (int i = 0; i < indexes.length; 
i++) {
-                                            fields[i] = 
seaTunnelRow.getField(indexes[i]);
-                                        }
-                                        seaTunnelRow = new 
SeaTunnelRow(fields);
-                                    }
-                                    if (isMergePartition) {
-                                        int index = 
seaTunnelRowType.getTotalFields();
-                                        for (String value : 
partitionsMap.values()) {
-                                            seaTunnelRow.setField(index++, 
value);
+                                    if (StringUtils.isBlank(rowDelimiter)) {
+                                        SeaTunnelRow seaTunnelRow =
+                                                deserializeRow(line, 
partitionsMap, tableId);
+                                        output.collect(seaTunnelRow);
+                                    } else {
+                                        String[] rows = 
line.split(rowDelimiter, -1);

Review Comment:
   > > I don’t understand why it doesn’t work in most cases. The processing 
logic is to use custom delimiters to split the read lines. In the example data 
you gave, the delimiters need to be escaped as "\|",There is no problem. Could 
you please tell me your processing logic?
   > 
   > For this case, the content will be split into `a,b,c,d`, `e,f`, `g,h`, 
`j,k,l,m`, am I right? But he correct content should be `a,b,c,d`, `e,f,g,h`, 
`j,k,l,m`, you still use `\n` to split the row.
   
   you are right. I misunderstood issue.



##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java:
##########
@@ -94,32 +96,21 @@ public void read(String path, String tableId, 
Collector<SeaTunnelRow> output)
                     .forEach(
                             line -> {
                                 try {
-                                    SeaTunnelRow seaTunnelRow =
-                                            
deserializationSchema.deserialize(line.getBytes());
-                                    if (!readColumns.isEmpty()) {
-                                        // need column projection
-                                        Object[] fields;
-                                        if (isMergePartition) {
-                                            fields =
-                                                    new Object
-                                                            [readColumns.size()
-                                                                    + 
partitionsMap.size()];
-                                        } else {
-                                            fields = new 
Object[readColumns.size()];
-                                        }
-                                        for (int i = 0; i < indexes.length; 
i++) {
-                                            fields[i] = 
seaTunnelRow.getField(indexes[i]);
-                                        }
-                                        seaTunnelRow = new 
SeaTunnelRow(fields);
-                                    }
-                                    if (isMergePartition) {
-                                        int index = 
seaTunnelRowType.getTotalFields();
-                                        for (String value : 
partitionsMap.values()) {
-                                            seaTunnelRow.setField(index++, 
value);
+                                    if (StringUtils.isBlank(rowDelimiter)) {
+                                        SeaTunnelRow seaTunnelRow =
+                                                deserializeRow(line, 
partitionsMap, tableId);
+                                        output.collect(seaTunnelRow);
+                                    } else {
+                                        String[] rows = 
line.split(rowDelimiter, -1);

Review Comment:
   > > I don’t understand why it doesn’t work in most cases. The processing 
logic is to use custom delimiters to split the read lines. In the example data 
you gave, the delimiters need to be escaped as "\|",There is no problem. Could 
you please tell me your processing logic?
   > 
   > For this case, the content will be split into `a,b,c,d`, `e,f`, `g,h`, 
`j,k,l,m`, am I right? But he correct content should be `a,b,c,d`, `e,f,g,h`, 
`j,k,l,m`, you still use `\n` to split the row.
   
   you are right. I misunderstood issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Feature] [File Connector] Make File Source Connector support row delimiter. [seatunnel]

Reply via email to