viirya commented on code in PR #13469: URL: https://github.com/apache/datafusion/pull/13469#discussion_r1847131367
########## datafusion/physical-plan/src/joins/sort_merge_join.rs: ########## @@ -68,8 +68,46 @@ use crate::{ RecordBatchStream, SendableRecordBatchStream, Statistics, }; -/// join execution plan executes partitions in parallel and combines them into a set of -/// partitions. +/// Join execution plan that executes equi-join predicates on multiple partitions using Sort-Merge +/// join algorithm and applies an optional filter post join. Can be used to join arbitrarily large +/// inputs where one or both of the inputs don't fit in the available memory. +/// +/// # Join Expressions +/// +/// Equi-join predicate (e.g. `<col1> = <col2>`) expressions are represented by [`Self::on`]. +/// +/// Non-equality predicates, which can not be pushed down to join inputs (e.g. +/// `<col1> != <col2>`) are known as "filter expressions" and are evaluated +/// after the equijoin predicates. They are represented by [`Self::filter`]. These are optional +/// expressions. +/// +/// # Sorting +/// +/// Assumes that both the left and right input to the join are pre-sorted. It is not the +/// responisibility of this execution plan to sort the inputs. +/// +/// # "Streamed" vs "Buffered" +/// +/// Buffered input is buffered for all record batches having the same value of join key. +/// If the memory limit increases beyond the specified value and spilling is enabled, +/// buffered batches could be spilled to disk. If spilling is disabled, the execution Review Comment: Is there a config for spilling? Shall we mention it here too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org