Re: [PR] [GLUTEN-9055][CH] Fix input_file_name diff [incubator-gluten]

via GitHub Wed, 26 Mar 2025 23:49:14 -0700


CodiumAI-Agent commented on PR #9142:
URL: 
https://github.com/apache/incubator-gluten/pull/9142#issuecomment-2756916910


   ## PR Reviewer Guide 🔍
   
   Here are some key observations to aid the review process:
   
   <table>
   <tr><td>
   
   **🎫 Ticket compliance analysis 🔶**
   
   
   
   **[9055](https://github.com/apache/incubator-gluten/issues/9055) - Partially 
compliant**
   
   Compliant requirements:
   
   - Added a new test case covering multiple file formats for input_file_name()
   - Updated push down logic for Hive table scans to handle input file 
expressions
   
   Non-compliant requirements:
   
   -
   
   Requires further human verification:
   
   - Validate the behavior in an actual Hive environment if possible
   
   
   
   </td></tr>
   <tr><td>⏱️&nbsp;<strong>Estimated effort to review</strong>: 3 
🔵🔵🔵⚪⚪</td></tr>
   <tr><td>🧪&nbsp;<strong>PR contains tests</strong></td></tr>
   <tr><td>🔒&nbsp;<strong>No security concerns identified</strong></td></tr>
   <tr><td>⚡&nbsp;<strong>Recommended focus areas for review</strong><br><br>
   
   <details><summary><a 
href='https://github.com/apache/incubator-gluten/pull/9142/files#diff-7c3a2853de5904b6c488003be4b92b2ec77b067a550e5294fc42b9cf05d8b54aR1647-R1685'><strong>Table
 Naming</strong></a>
   
   The test creates table names using only the first two characters of the 
format, which could lead to collisions or ambiguous naming; consider making 
table names more unique.
   </summary>
   
   ```scala
   test("test input_file_name() in different formats") {
     val formats = Seq("textfile", "orc", "parquet")
     val tableNamePrefix = "sales_"
   
     formats.foreach {
       format =>
         val tableName = s"$tableNamePrefix${format.take(2)}"
         val createTableSql =
           s"""
              |CREATE TABLE $tableName (
              |  product_id STRING,
              |  quantity INT
              |) PARTITIONED BY (year STRING)
              |STORED AS $format
              |""".stripMargin
   
         val insertDataSql1 =
           s"""
              |INSERT INTO $tableName PARTITION(year='2001')
              |SELECT 'prod1', 100
              |""".stripMargin
   
         val insertDataSql2 =
           s"""
              |INSERT INTO $tableName PARTITION(year='2002')
              |SELECT 'prod1', 200
              |""".stripMargin
   
         val selectSql = s"SELECT input_file_name(), input_file_block_start(), 
" +
           s"input_file_block_length() FROM $tableName"
         val dropSql = s"DROP TABLE IF EXISTS $tableName"
   
         spark.sql(createTableSql)
         spark.sql(insertDataSql1)
         spark.sql(insertDataSql2)
         compareResultsAgainstVanillaSpark(selectSql, compareResult = true, _ 
=> {})
         spark.sql(dropSql)
     }
   }
   ```
   
   </details>
   
   <details><summary><a 
href='https://github.com/apache/incubator-gluten/pull/9142/files#diff-d350dba2dc24e5c89b32766f8ab3c2f7eb4e3131b7625080a545d98f44f69ec3R114-R128'><strong>Session
 Handling</strong></a>
   
   The new branch for HiveTableScanExecTransformer uses child.session in the 
copy call; ensure that the session is appropriate and that no unintended side 
effects occur.
   </summary>
   
   ```scala
   
   object PostOffload extends Rule[SparkPlan] {
     override def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
       case p @ ProjectExec(projectList, child: FileSourceScanExecTransformer)
           if projectList.exists(containsInputFileRelatedExpr) =>
         child.copy(output = p.output)
       case p @ ProjectExec(projectList, child: HiveTableScanExecTransformer)
           if projectList.exists(containsInputFileRelatedExpr) =>
         child.copy(
           requestedAttributes = p.output,
           relation = child.relation,
           partitionPruningPred = child.partitionPruningPred,
           prunedOutput = child.prunedOutput
         )(child.session)
       case p @ ProjectExec(projectList, child: BatchScanExecTransformer)
   ```
   
   </details>
   
   </td></tr>
   </table>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-9055][CH] Fix input_file_name diff [incubator-gluten]

Reply via email to