CodiumAI-Agent commented on PR #8847: URL: https://github.com/apache/incubator-gluten/pull/8847#issuecomment-2689541954
## PR Reviewer Guide ๐ Here are some key observations to aid the review process: <table> <tr><td> **๐ซ Ticket compliance analysis โ ** **[8846](https://github.com/apache/incubator-gluten/issues/8846) - Fully compliant** Compliant requirements: - Implementation of Iceberg equality delete file reader and related pipelines - New unit tests and benchmarks added </td></tr> <tr><td>โฑ๏ธ <strong>Estimated effort to review</strong>: 5 ๐ต๐ต๐ต๐ต๐ต</td></tr> <tr><td>๐งช <strong>PR contains tests</strong></td></tr> <tr><td>๐ <strong>No security concerns identified</strong></td></tr> <tr><td>โก <strong>Recommended focus areas for review</strong><br><br> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/8847/files#diff-f4625e8060e15b27210f8d23ce44c6e214e6433d2151100419b64483f42205e0R95-R115'><strong>Code Clarity</strong></a> Consider adding more detailed comments and clarifications in the functions of the EqualityDeleteActionBuilder to explain the logic for constructing predicates from delete files. This will help future maintainers understand the mapping between the delete file content and the constructed filter expressions. </summary> ```c++ } void EqualityDeleteActionBuilder::notIn(Block deleteBlock, const std::string & column_name) { assert(deleteBlock.columns() == 1); const auto & elem_block = deleteBlock.getByPosition(0); const std::string notIn{"notIn"}; ActionsDAG::NodeRawConstPtrs args; const auto & colName = column_name.empty() ? elem_block.name : column_name; args.push_back(&actions.findInOutputs(colName)); PreparedSets prepared_sets; FutureSet::Hash emptyKey; auto future_set = prepared_sets.addFromTuple(emptyKey, nullptr, {elem_block}, context->getSettingsRef()); auto arg = ColumnSet::create(1, std::move(future_set)); args.emplace_back(&actions.addColumn(ColumnWithTypeAndName(std::move(arg), std::make_shared<DataTypeSet>(), "__set"))); auto function_builder = FunctionFactory::instance().get(notIn, context); andArgs.push_back(&addFunction(function_builder, std::move(args))); } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/8847/files#diff-b74de69bb5b8b743a6a74fafe0bbb47238b7a7c0cc10c00a3f9e30977a5922ffR82-R92'><strong>Loop Behavior</strong></a> The doPull() method uses a while loop to repeatedly filter out deleted rows. It would be good to validate that this loop will not run indefinitely in scenarios where all rows are filtered or if the filtering predicate might never let any rows pass. Consider adding a safeguard or more comments. </summary> ```c++ while (true) { Chunk chunk = NormalFileReader::doPull(); if (chunk.getNumRows() == 0) return chunk; deleteRows(chunk); if (chunk.getNumRows() != 0) return chunk; } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/8847/files#diff-b194484bbd791ddfd89db8780c3ee7aa15a95e5c0c9b293351ec3cc189c8a5a9R159-R167'><strong>Schema Casting</strong></a> The newly added castBlock function adjusts the input block to match the preferred schema. Please verify that this casting handles all edge cases (especially with virtual columns) and that no type mismatches can occur during writing. </summary> ```c++ DB::Block NormalFileWriter::castBlock(const DB::Block & block) const { if (!block) return block; Block res = block; /// In case input block didn't have the same types as the preferred schema, we cast the input block to the preferred schema. /// Notice that preferred_schema is the actual file schema, which is also the data schema of current inserted table. ``` </details> </td></tr> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
