CodiumAI-Agent commented on PR #9609: URL: https://github.com/apache/incubator-gluten/pull/9609#issuecomment-2875586965
## PR Code Suggestions ✨ <!-- 1a39d8a --> <table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=2>Possible issue</td> <td> <details><summary>Loop over actual row count</summary> ___ **<code>part_name_column_data.size()</code> reflects the cardinality dictionary size, not the <br>number of rows. Iterate up to <code>num_rows</code> so each position is handled correctly.** [cpp-ch/local-engine/Operator/FillingDeltaInternalRowDeletedStep.cpp [107-116]](https://github.com/apache/incubator-gluten/pull/9609/files#diff-f1fec2ab5d884a56f8b3bdfbe185d22c8237556fd1203a4dce2e6e738f97790fR107-R116) ```diff -for (size_t i = 0; i < part_name_column_data.size(); ++i) +for (size_t i = 0; i < num_rows; ++i) { std::string part_name = part_name_column_data.getDataAt(i).toString(); if (dv_map.contains(part_name)) { vec[i] = dv_map.at(part_name)->rb_contains(row_index_column_data.get64(i)); } else { vec[i] = false; } } ``` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: Iterating to `part_name_column_data.size()` will use dictionary cardinality instead of `num_rows`, risking skipped rows. Using `num_rows` ensures each row is processed correctly. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Prevent missing‐tag crashes</summary> ___ **Accessing tags with <code>apply</code> may throw if a key is missing. Use <code>getOrElse</code> or pattern <br>matching to provide defaults or fail fast with a clear message.** [backends-clickhouse/src-delta-23/main/scala/org/apache/spark/sql/execution/datasources/v2/clickhouse/metadata/AddFileTags.scala [171-198]](https://github.com/apache/incubator-gluten/pull/9609/files#diff-b81a1776c3a0828e5e6e38757f6cfa76d7a2bfa99d744f6c3a2c16504ee7e79bR171-R198) ```diff new AddMergeTreeParts( - addFile.tags("database"), - addFile.tags("table"), - addFile.tags("engine"), - addFile.tags("path"), - addFile.tags("targetNode"), + addFile.tags.getOrElse("database", throw new IllegalArgumentException("missing database tag")), + addFile.tags.getOrElse("table", throw new IllegalArgumentException("missing table tag")), + addFile.tags.getOrElse("engine", "MergeTree"), + addFile.tags.getOrElse("path", ""), + addFile.tags.getOrElse("targetNode", ""), addFile.path, - addFile.tags("uuid"), - addFile.tags("rows").toLong, + addFile.tags.getOrElse("uuid", ""), + addFile.tags.get("rows").map(_.toLong).getOrElse(0L), addFile.size, - addFile.tags("dataCompressedBytes").toLong, - addFile.tags("dataUncompressedBytes").toLong, + addFile.tags.get("dataCompressedBytes").map(_.toLong).getOrElse(0L), + addFile.tags.get("dataUncompressedBytes").map(_.toLong).getOrElse(0L), addFile.modificationTime, - addFile.tags("partitionId"), - addFile.tags("minBlockNumber").toLong, - addFile.tags("maxBlockNumber").toLong, - addFile.tags("level").toInt, - addFile.tags("dataVersion").toLong, - addFile.tags("bucketNum"), - addFile.tags("dirName"), + addFile.tags.getOrElse("partitionId", ""), + addFile.tags.get("minBlockNumber").map(_.toLong).getOrElse(0L), + addFile.tags.get("maxBlockNumber").map(_.toLong).getOrElse(0L), + addFile.tags.get("level").map(_.toInt).getOrElse(0), + addFile.tags.get("dataVersion").map(_.toLong).getOrElse(0L), + addFile.tags.getOrElse("bucketNum", ""), + addFile.tags.getOrElse("dirName", ""), addFile.dataChange, - addFile.tags("partition"), - addFile.tags("defaultCompressionCodec"), + addFile.tags.getOrElse("partition", ""), + addFile.tags.getOrElse("defaultCompressionCodec", "LZ4"), addFile.stats, addFile.partitionValues, - marks = addFile.tags("marks").toLong, + marks = addFile.tags.get("marks").map(_.toLong).getOrElse(-1L), tags = addFile.tags, deletionVector = addFile.deletionVector ) ``` <details><summary>Suggestion importance[1-10]: 4</summary> __ Why: This change uses safe lookups on `tags` to avoid runtime `NoSuchElementException`, but it may mask configuration errors by defaulting rather than failing fast. It offers moderate safety but could hide missing‐tag bugs. </details></details></td><td align=center>Low </td></tr></tr></tbody></table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
