CodiumAI-Agent commented on PR #9142: URL: https://github.com/apache/incubator-gluten/pull/9142#issuecomment-2756916910
## PR Reviewer Guide ๐ Here are some key observations to aid the review process: <table> <tr><td> **๐ซ Ticket compliance analysis ๐ถ** **[9055](https://github.com/apache/incubator-gluten/issues/9055) - Partially compliant** Compliant requirements: - Added a new test case covering multiple file formats for input_file_name() - Updated push down logic for Hive table scans to handle input file expressions Non-compliant requirements: - Requires further human verification: - Validate the behavior in an actual Hive environment if possible </td></tr> <tr><td>โฑ๏ธ <strong>Estimated effort to review</strong>: 3 ๐ต๐ต๐ตโชโช</td></tr> <tr><td>๐งช <strong>PR contains tests</strong></td></tr> <tr><td>๐ <strong>No security concerns identified</strong></td></tr> <tr><td>โก <strong>Recommended focus areas for review</strong><br><br> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9142/files#diff-7c3a2853de5904b6c488003be4b92b2ec77b067a550e5294fc42b9cf05d8b54aR1647-R1685'><strong>Table Naming</strong></a> The test creates table names using only the first two characters of the format, which could lead to collisions or ambiguous naming; consider making table names more unique. </summary> ```scala test("test input_file_name() in different formats") { val formats = Seq("textfile", "orc", "parquet") val tableNamePrefix = "sales_" formats.foreach { format => val tableName = s"$tableNamePrefix${format.take(2)}" val createTableSql = s""" |CREATE TABLE $tableName ( | product_id STRING, | quantity INT |) PARTITIONED BY (year STRING) |STORED AS $format |""".stripMargin val insertDataSql1 = s""" |INSERT INTO $tableName PARTITION(year='2001') |SELECT 'prod1', 100 |""".stripMargin val insertDataSql2 = s""" |INSERT INTO $tableName PARTITION(year='2002') |SELECT 'prod1', 200 |""".stripMargin val selectSql = s"SELECT input_file_name(), input_file_block_start(), " + s"input_file_block_length() FROM $tableName" val dropSql = s"DROP TABLE IF EXISTS $tableName" spark.sql(createTableSql) spark.sql(insertDataSql1) spark.sql(insertDataSql2) compareResultsAgainstVanillaSpark(selectSql, compareResult = true, _ => {}) spark.sql(dropSql) } } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9142/files#diff-d350dba2dc24e5c89b32766f8ab3c2f7eb4e3131b7625080a545d98f44f69ec3R114-R128'><strong>Session Handling</strong></a> The new branch for HiveTableScanExecTransformer uses child.session in the copy call; ensure that the session is appropriate and that no unintended side effects occur. </summary> ```scala object PostOffload extends Rule[SparkPlan] { override def apply(plan: SparkPlan): SparkPlan = plan.transformUp { case p @ ProjectExec(projectList, child: FileSourceScanExecTransformer) if projectList.exists(containsInputFileRelatedExpr) => child.copy(output = p.output) case p @ ProjectExec(projectList, child: HiveTableScanExecTransformer) if projectList.exists(containsInputFileRelatedExpr) => child.copy( requestedAttributes = p.output, relation = child.relation, partitionPruningPred = child.partitionPruningPred, prunedOutput = child.prunedOutput )(child.session) case p @ ProjectExec(projectList, child: BatchScanExecTransformer) ``` </details> </td></tr> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
