lightzhao commented on code in PR #5873:
URL: https://github.com/apache/seatunnel/pull/5873#discussion_r1398714607
##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java:
##########
@@ -94,32 +96,21 @@ public void read(String path, String tableId,
Collector<SeaTunnelRow> output)
.forEach(
line -> {
try {
- SeaTunnelRow seaTunnelRow =
-
deserializationSchema.deserialize(line.getBytes());
- if (!readColumns.isEmpty()) {
- // need column projection
- Object[] fields;
- if (isMergePartition) {
- fields =
- new Object
- [readColumns.size()
- +
partitionsMap.size()];
- } else {
- fields = new
Object[readColumns.size()];
- }
- for (int i = 0; i < indexes.length;
i++) {
- fields[i] =
seaTunnelRow.getField(indexes[i]);
- }
- seaTunnelRow = new
SeaTunnelRow(fields);
- }
- if (isMergePartition) {
- int index =
seaTunnelRowType.getTotalFields();
- for (String value :
partitionsMap.values()) {
- seaTunnelRow.setField(index++,
value);
+ if (StringUtils.isBlank(rowDelimiter)) {
+ SeaTunnelRow seaTunnelRow =
+ deserializeRow(line,
partitionsMap, tableId);
+ output.collect(seaTunnelRow);
+ } else {
+ String[] rows =
line.split(rowDelimiter, -1);
Review Comment:
> You cannot deal with rowDelimiter at this way, this can not work in most
of case, since not all data will exist at one line. e.g. If I use `|` as
row_delimiter.
>
> ```
> a,b,c,d | e,f
> g,h | j,k,l,m
> ```
I don’t understand why it doesn’t work in most cases. The processing logic
is to use custom delimiters to split the read lines. In the example data you
gave, the delimiters need to be escaped as "\\\\|",There is no problem. Could
you please tell me your processing logic?
##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java:
##########
@@ -94,32 +96,21 @@ public void read(String path, String tableId,
Collector<SeaTunnelRow> output)
.forEach(
line -> {
try {
- SeaTunnelRow seaTunnelRow =
-
deserializationSchema.deserialize(line.getBytes());
- if (!readColumns.isEmpty()) {
- // need column projection
- Object[] fields;
- if (isMergePartition) {
- fields =
- new Object
- [readColumns.size()
- +
partitionsMap.size()];
- } else {
- fields = new
Object[readColumns.size()];
- }
- for (int i = 0; i < indexes.length;
i++) {
- fields[i] =
seaTunnelRow.getField(indexes[i]);
- }
- seaTunnelRow = new
SeaTunnelRow(fields);
- }
- if (isMergePartition) {
- int index =
seaTunnelRowType.getTotalFields();
- for (String value :
partitionsMap.values()) {
- seaTunnelRow.setField(index++,
value);
+ if (StringUtils.isBlank(rowDelimiter)) {
+ SeaTunnelRow seaTunnelRow =
+ deserializeRow(line,
partitionsMap, tableId);
+ output.collect(seaTunnelRow);
+ } else {
+ String[] rows =
line.split(rowDelimiter, -1);
Review Comment:
the delimiters need to be escaped as "\\\\|"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]