ctsk commented on code in PR #16153: URL: https://github.com/apache/datafusion/pull/16153#discussion_r2104961508
########## datafusion/physical-plan/src/joins/join_hash_map.rs: ########## @@ -168,6 +168,9 @@ pub trait JoinHashMapType { /// Returns a reference to the next. fn get_list(&self) -> &Self::NextType; + // Whether values in the hashmap are distinct (no duplicate keys) + fn is_distinct(&self) -> bool; Review Comment: ```suggestion fn is_distinct(&self) -> bool { false } ``` I don't know if there are other implementers of this trait, but it is technically `pub`... ########## datafusion/physical-plan/src/joins/join_hash_map.rs: ########## @@ -338,6 +361,11 @@ impl JoinHashMapType for JoinHashMap { fn get_list(&self) -> &Self::NextType { &self.next } + + // /// Check if the values in the hashmap are distinct. Review Comment: ```suggestion /// Check if the values in the hashmap are distinct. ``` ########## datafusion/physical-plan/src/joins/join_hash_map.rs: ########## @@ -338,6 +361,11 @@ impl JoinHashMapType for JoinHashMap { fn get_list(&self) -> &Self::NextType { &self.next } + + // /// Check if the values in the hashmap are distinct. + fn is_distinct(&self) -> bool { + self.map.len() == self.next.len() Review Comment: Neat that it's so simple ^^ ########## datafusion/physical-plan/src/joins/join_hash_map.rs: ########## @@ -261,13 +264,32 @@ pub trait JoinHashMapType { limit: usize, offset: JoinHashMapOffset, ) -> (Vec<u32>, Vec<u64>, Option<JoinHashMapOffset>) { - let mut input_indices = vec![]; - let mut match_indices = vec![]; - - let mut remaining_output = limit; + let mut input_indices = Vec::with_capacity(limit); + let mut match_indices = Vec::with_capacity(limit); Review Comment: Might be worth investigating in a future PR if these vectors can be reused between lookups. ########## datafusion/physical-plan/src/joins/join_hash_map.rs: ########## @@ -261,13 +264,32 @@ pub trait JoinHashMapType { limit: usize, offset: JoinHashMapOffset, ) -> (Vec<u32>, Vec<u64>, Option<JoinHashMapOffset>) { - let mut input_indices = vec![]; - let mut match_indices = vec![]; - - let mut remaining_output = limit; + let mut input_indices = Vec::with_capacity(limit); + let mut match_indices = Vec::with_capacity(limit); let hash_map: &HashTable<(u64, u64)> = self.get_map(); let next_chain = self.get_list(); + // Check if hashmap consists of unique values + // If so, we can skip the chain traversal + if self.is_distinct() { Review Comment: ```suggestion if self.is_distinct() && deleted_offset.is_none() { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org