Re: [PR] HIVE-29551: Avoid quadratic runtime in ColumnStatsSemanticAnalyzer#ge… [hive]

via GitHub Thu, 14 May 2026 00:25:56 -0700


Aggarwal-Raghav commented on PR #6443:
URL: https://github.com/apache/hive/pull/6443#issuecomment-4448600043


   @tanishq-chugh / @abstractdog , I have a question.
   
   1. In `validateSpecifiedColumnNames` we are checking if columns exists — 1 
HashMap
   2. In `checkForPartitionColumns` we are checking for partitions columns — 1 
HashSet
   3. In `getFieldSchemasByColName` we are getting the type of the above 
validated columns — 1 HashMap
   
   Can't we do all 1 and 2 inside 3 while maintaining 1 DataStrucuture? I think 
it should be possible.
   
   **The optimization + refactoring in this patch is good**. 
   Just thinking in terms of math, ColumnStatsSemanticAnalyzer will run in 
`Query Compilation` phase so If my competitive coding concepts are correct then:
   ```
   For 1000 columns, O(N^2) => 1Million i.e 10^6, which modern computer it does 
this in 1 sec.
   ```  
   For columns more than 3k or so the real benefit of this optimization will 
kick in i guess. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29551: Avoid quadratic runtime in ColumnStatsSemanticAnalyzer#ge… [hive]

Reply via email to