LakshSingla commented on issue #15242: URL: https://github.com/apache/druid/issues/15242#issuecomment-1792947538
I am unsure if I am interpreting the graph correctly but doesn't the String#hashCode consume like 5% of the `Thread.run()` time? That doesn't seem so bad to me. Also, `String#hashCode` is cached after the initial call that is made. Perhaps the cardinality of the column that we are hashing is pretty high, which can also explain why the hashCode is taking more time than you expect (though it doesn't seem like the root cause to me). From the query mentioned, the group by is on the `userId`. Semantically, isn't that field unique, which can explain why you are seeing poor performance, due to high cardinality & low duplication across the rows? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
