Copilot commented on code in PR #6350:
URL: https://github.com/apache/paimon/pull/6350#discussion_r2387062661


##########
docs/content/concepts/spec/fileformat.md:
##########
@@ -419,6 +419,18 @@ Format Options:
       <td>String</td>
       <td>Null literal string that is interpreted as a null value (disabled by 
default).</td>
     </tr>
+    <tr>
+      <td><h5>csv.mode</h5></td>
+      <td style="word-wrap: break-word;"><code>PERMISSIVE</code></td>
+      <td>String</td>
+      <td>Allows a mode for dealing with corrupt records during reading. 
Currently supported values are <code>'PERMISSIVE'</code>, 
<code>'DROPMALFORMED'</code> and <code>'FAILFAST'</code>:
+      <ul>
+      <li>Option <code>'PERMISSIVE'</code> sets malformed fields to null.</li>
+      <li>Option <code>'DROP'</code> ignores the whole corrupted records.</li>
+      <li>Option <code>'LITERAL'</code> throws an exception when it meets 
corrupted records.</li>

Review Comment:
   The documentation contains incorrect mode names. Should be `'DROPMALFORMED'` 
and `'FAILFAST'` to match the actual enum values.
   ```suggestion
         <li>Option <code>'DROPMALFORMED'</code> ignores the whole corrupted 
records.</li>
         <li>Option <code>'FAILFAST'</code> throws an exception when it meets 
corrupted records.</li>
   ```



##########
paimon-format/src/main/java/org/apache/paimon/format/csv/CsvFileReader.java:
##########
@@ -98,8 +99,25 @@ protected void setupReading() throws IOException {
     }
 
     private class CsvRecordIterator extends BaseTextRecordIterator {
-        // Inherits all functionality from BaseTextRecordIterator
-        // No additional CSV-specific iterator logic needed
+        @Override
+        public InternalRow next() throws IOException {
+            while (true) {
+                if (readerClosed) {
+                    return null;
+                }
+                String nextLine = bufferedReader.readLine();
+                if (nextLine == null) {
+                    end = true;
+                    return null;
+                }
+
+                currentPosition++;
+                InternalRow row = parseLine(nextLine);
+                if (!row.equals(DROP_ROW)) {

Review Comment:
   Using equals() to compare InternalRow objects is unreliable as it may not 
work correctly for row comparison. Consider using reference equality (==) since 
DROP_ROW is a sentinel value, or implement a proper comparison method.
   ```suggestion
                   if (row != DROP_ROW) {
   ```



##########
paimon-format/src/main/java/org/apache/paimon/format/csv/CsvFileReader.java:
##########
@@ -148,8 +166,21 @@ private InternalRow parseCsvLine(String line, CsvSchema 
schema) throws IOExcepti
                 }
 
                 // Optimized field parsing with cached cast executors
-                projectedValues[i] =
-                        parseFieldOptimized(field.trim(), 
dataSchemaRowType.getTypeAt(readIndex));
+                try {
+                    projectedValues[i] =
+                            parseFieldOptimized(
+                                    field.trim(), 
dataSchemaRowType.getTypeAt(readIndex));
+                } catch (Exception e) {

Review Comment:
   Catching the generic Exception type is too broad and may hide unexpected 
errors. Consider catching more specific exceptions related to parsing failures, 
such as NumberFormatException or IllegalArgumentException.
   ```suggestion
                   } catch (NumberFormatException | IllegalArgumentException e) 
{
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to