CodiumAI-Agent commented on PR #8847:
URL: 
https://github.com/apache/incubator-gluten/pull/8847#issuecomment-2689541954

   ## PR Reviewer Guide ๐Ÿ”
   
   Here are some key observations to aid the review process:
   
   <table>
   <tr><td>
   
   **๐ŸŽซ Ticket compliance analysis โœ…**
   
   
   
   **[8846](https://github.com/apache/incubator-gluten/issues/8846) - Fully 
compliant**
   
   Compliant requirements:
   
   - Implementation of Iceberg equality delete file reader and related pipelines
   - New unit tests and benchmarks added
   
   
   
   </td></tr>
   <tr><td>โฑ๏ธ&nbsp;<strong>Estimated effort to review</strong>: 5 
๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต</td></tr>
   <tr><td>๐Ÿงช&nbsp;<strong>PR contains tests</strong></td></tr>
   <tr><td>๐Ÿ”’&nbsp;<strong>No security concerns identified</strong></td></tr>
   <tr><td>โšก&nbsp;<strong>Recommended focus areas for review</strong><br><br>
   
   <details><summary><a 
href='https://github.com/apache/incubator-gluten/pull/8847/files#diff-f4625e8060e15b27210f8d23ce44c6e214e6433d2151100419b64483f42205e0R95-R115'><strong>Code
 Clarity</strong></a>
   
   Consider adding more detailed comments and clarifications in the functions 
of the EqualityDeleteActionBuilder to explain the logic for constructing 
predicates from delete files. This will help future maintainers understand the 
mapping between the delete file content and the constructed filter expressions.
   </summary>
   
   ```c++
   }
   
   void EqualityDeleteActionBuilder::notIn(Block deleteBlock, const std::string 
& column_name)
   {
       assert(deleteBlock.columns() == 1);
       const auto & elem_block = deleteBlock.getByPosition(0);
   
       const std::string notIn{"notIn"};
   
       ActionsDAG::NodeRawConstPtrs args;
       const auto & colName = column_name.empty() ? elem_block.name : 
column_name;
       args.push_back(&actions.findInOutputs(colName));
       PreparedSets prepared_sets;
       FutureSet::Hash emptyKey;
       auto future_set = prepared_sets.addFromTuple(emptyKey, nullptr, 
{elem_block}, context->getSettingsRef());
       auto arg = ColumnSet::create(1, std::move(future_set));
       
args.emplace_back(&actions.addColumn(ColumnWithTypeAndName(std::move(arg), 
std::make_shared<DataTypeSet>(), "__set")));
   
       auto function_builder = FunctionFactory::instance().get(notIn, context);
       andArgs.push_back(&addFunction(function_builder, std::move(args)));
   }
   ```
   
   </details>
   
   <details><summary><a 
href='https://github.com/apache/incubator-gluten/pull/8847/files#diff-b74de69bb5b8b743a6a74fafe0bbb47238b7a7c0cc10c00a3f9e30977a5922ffR82-R92'><strong>Loop
 Behavior</strong></a>
   
   The doPull() method uses a while loop to repeatedly filter out deleted rows. 
It would be good to validate that this loop will not run indefinitely in 
scenarios where all rows are filtered or if the filtering predicate might never 
let any rows pass. Consider adding a safeguard or more comments.
   </summary>
   
   ```c++
   while (true)
   {
       Chunk chunk = NormalFileReader::doPull();
       if (chunk.getNumRows() == 0)
           return chunk;
   
       deleteRows(chunk);
   
       if (chunk.getNumRows() != 0)
           return chunk;
   }
   ```
   
   </details>
   
   <details><summary><a 
href='https://github.com/apache/incubator-gluten/pull/8847/files#diff-b194484bbd791ddfd89db8780c3ee7aa15a95e5c0c9b293351ec3cc189c8a5a9R159-R167'><strong>Schema
 Casting</strong></a>
   
   The newly added castBlock function adjusts the input block to match the 
preferred schema. Please verify that this casting handles all edge cases 
(especially with virtual columns) and that no type mismatches can occur during 
writing.
   </summary>
   
   ```c++
   DB::Block NormalFileWriter::castBlock(const DB::Block & block) const
   {
       if (!block)
           return block;
   
       Block res = block;
   
       /// In case input block didn't have the same types as the preferred 
schema, we cast the input block to the preferred schema.
       /// Notice that preferred_schema is the actual file schema, which is 
also the data schema of current inserted table.
   ```
   
   </details>
   
   </td></tr>
   </table>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to