Aggarwal-Raghav commented on PR #6443: URL: https://github.com/apache/hive/pull/6443#issuecomment-4448600043
@tanishq-chugh / @abstractdog , I have a question. 1. In `validateSpecifiedColumnNames` we are checking if columns exists — 1 HashMap 2. In `checkForPartitionColumns` we are checking for partitions columns — 1 HashSet 3. In `getFieldSchemasByColName` we are getting the type of the above validated columns — 1 HashMap Can't we do all 1 and 2 inside 3 while maintaining 1 DataStrucuture? I think it should be possible. **The optimization + refactoring in this patch is good**. Just thinking in terms of math, ColumnStatsSemanticAnalyzer will run in `Query Compilation` phase so If my competitive coding concepts are correct then: ``` For 1000 columns, O(N^2) => 1Million i.e 10^6, which modern computer it does this in 1 sec. ``` For columns more than 3k or so the real benefit of this optimization will kick in i guess. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
