hemanthumashankar0511 commented on PR #6317: URL: https://github.com/apache/hive/pull/6317#issuecomment-3931787170
@abstractdog and @ayushtkn, I wanted to follow up properly on both points raised here. First, @abstractdog, thank you for correcting me on how `HashSet` works! I genuinely didn't realize it always computes `hashCode()` first before even getting to `equals()`. I was wrong to claim the Set check was "mostly just comparing memory addresses," and I really appreciate you taking the time to explain that clearly. Regarding the self-join safety concern, I decided to actually debug this locally. I attached a debugger to a test run, put a breakpoint inside `configureJobConf`, and inspected the `aliasToPartnInfo` map while executing a self-join query: ```sql SELECT * FROM test t1 JOIN test t2 USING(a); ``` When I expanded `aliasToPartnInfo` in the debugger, I could see two entries: one for alias `t1` and one for alias `t2`. Both PartitionDesc objects had their tableDesc field pointing to the exact same @ identity number in the debugger, confirming they are the exact same Java object instance in memory. So, my original safety argument was wrong! I thought that a self-join might produce two distinct `TableDesc` instances with different column configurations, but that's not what happens. Hive reuses the exact same `TableDesc` instance for all aliases of the same underlying table. Because of this, `Set<TableDesc>` and `Set<String>` behave identically in this scenario, they both deduplicate correctly without skipping anything. I am more than happy to switch to using `Set<String>` via `tableDesc.getTableName()` as you suggested. It is definitely lighter, and the behavior is exactly the same. I'll update the patch right away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
