EricJoy2048 commented on code in PR #5873:
URL: https://github.com/apache/seatunnel/pull/5873#discussion_r1398817242
##########
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java:
##########
@@ -94,32 +96,21 @@ public void read(String path, String tableId,
Collector<SeaTunnelRow> output)
.forEach(
line -> {
try {
- SeaTunnelRow seaTunnelRow =
-
deserializationSchema.deserialize(line.getBytes());
- if (!readColumns.isEmpty()) {
- // need column projection
- Object[] fields;
- if (isMergePartition) {
- fields =
- new Object
- [readColumns.size()
- +
partitionsMap.size()];
- } else {
- fields = new
Object[readColumns.size()];
- }
- for (int i = 0; i < indexes.length;
i++) {
- fields[i] =
seaTunnelRow.getField(indexes[i]);
- }
- seaTunnelRow = new
SeaTunnelRow(fields);
- }
- if (isMergePartition) {
- int index =
seaTunnelRowType.getTotalFields();
- for (String value :
partitionsMap.values()) {
- seaTunnelRow.setField(index++,
value);
+ if (StringUtils.isBlank(rowDelimiter)) {
+ SeaTunnelRow seaTunnelRow =
+ deserializeRow(line,
partitionsMap, tableId);
+ output.collect(seaTunnelRow);
+ } else {
+ String[] rows =
line.split(rowDelimiter, -1);
Review Comment:
```
BufferedReader reader =
new BufferedReader(new InputStreamReader(inputStream,
StandardCharsets.UTF_8))
reader.lines()
```
Actually, what worries me more is here. If a 500M file uses `|` as the row
separator and there are no `\n` symbols in the file, will the `reader.lines()`
method directly read all the contents of the file into memory?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]