[
https://issues.apache.org/jira/browse/TAJO-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232485#comment-14232485
]
ASF GitHub Bot commented on TAJO-1222:
--------------------------------------
Github user jinossy commented on a diff in the pull request:
https://github.com/apache/tajo/pull/277#discussion_r21209793
--- Diff:
tajo-storage/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
---
@@ -355,21 +368,58 @@ public float getProgress() {
@Override
public Tuple next() throws IOException {
+
+ if (!reader.isReadable()) {
+ return null;
+ }
+
+ if (targets.length == 0) {
+ return EmptyTuple.get();
+ }
+
+ VTuple tuple = new VTuple(schema.size());
+
try {
- if (!reader.isReadable()) return null;
- ByteBuf buf = readLine();
- if (buf == null) return null;
+ // this loop will continue until one tuple is build or EOS (end of
stream).
+ do {
- if (targets.length == 0) {
- return EmptyTuple.get();
- }
+ ByteBuf buf = readLine();
+ if (buf == null) {
+ return null;
+ }
+
+ try {
+
+ deserializer.deserialize(buf, tuple);
+
--- End diff --
Could you move the {{recordCount}} to this line?
> DelimitedTextFile should be tolerant against parsing errors.
> ------------------------------------------------------------
>
> Key: TAJO-1222
> URL: https://issues.apache.org/jira/browse/TAJO-1222
> Project: Tajo
> Issue Type: Bug
> Components: storage
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.9.1
>
> Attachments: TAJO-1222.patch
>
>
> DelimitedTextFile is a base class for plan-text file formats like CSV or
> JSON. In practice, due to various reasons, parsing errors are usual. But, the
> current implementation does not allow any parsing error. It is inconvenient
> in many cases.
> The objective of this issue is to enable DelimitedTextFile to tolerate
> parsing errors up to the number given by users.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)