CodiumAI-Agent commented on PR #9454: URL: https://github.com/apache/incubator-gluten/pull/9454#issuecomment-2849083267
## PR Reviewer Guide ๐ Here are some key observations to aid the review process: <table> <tr><td> **๐ซ Ticket compliance analysis ๐ถ** **[9453](https://github.com/apache/incubator-gluten/issues/9453) - Partially compliant** Compliant requirements: - Refactored suite classes to use composition and traits (e.g., `TPCHSaltedTable`, `withTPCHQuery`). - Removed redundant child classes and long inheritance chains. - Introduced reusable helper methods (`customCheck`, `setupTestCase`, `createTPCHTables`). - Standardized configuration management via `withSQLConf` wrappers. Non-compliant requirements: - None Requires further human verification: - Verify that the new `TPCHSaltedTable` trait actually writes salted/parquet data to disk as intended. </td></tr> <tr><td>โฑ๏ธ <strong>Estimated effort to review</strong>: 4 ๐ต๐ต๐ต๐ตโช</td></tr> <tr><td>๐งช <strong>PR contains tests</strong></td></tr> <tr><td>๐ <strong>No security concerns identified</strong></td></tr> <tr><td>โก <strong>Recommended focus areas for review</strong><br><br> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9454/files#diff-61645ebd024126f2ea0399957ed7e7ebdc07cec3865d2e8e7feaab5a67f5b2c3R36-R47'><strong>Data Salting Bug</strong></a> The `TPCHSaltedTable` trait processes each table using `map` (which builds a new collection) instead of `foreach` and never writes the salted DataFrame to `saltedTablesPath`, so the salted tables may not actually be created. </summary> ```scala trait TPCHSaltedTable extends TPCHDatabase { override protected def createTestTables(): Unit = { // first process the parquet data to: // 1. make every column nullable in schema (optional rather than required) // 2. salt some null values randomly val saltedTablesPath = s"$dataHome/tpch-salted" withSQLConf((GlutenConfig.GLUTEN_ENABLED.key, "false")) { tpchTables .map( tableName => { val originTablePath = s"$testParquetAbsolutePath/$tableName" ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9454/files#diff-61645ebd024126f2ea0399957ed7e7ebdc07cec3865d2e8e7feaab5a67f5b2c3R49-R50'><strong>Unused Variable</strong></a> The local variable `salted_df` is assigned inside the loop but never used or persisted, indicating the salting implementation is incomplete. </summary> ```scala var salted_df: Option[DataFrame] = None for (c <- df.schema) { ``` </details> </td></tr> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
