parthchandra commented on code in PR #1893: URL: https://github.com/apache/datafusion-comet/pull/1893#discussion_r2178204401
########## native/spark-expr/src/agg_funcs/avg_decimal.rs: ########## @@ -341,29 +337,16 @@ impl AvgDecimalGroupsAccumulator { } } - fn is_overflow(&self, index: usize) -> bool { - !self.is_empty.get_bit(index) && !self.is_not_null.get_bit(index) - } - + #[inline] fn update_single(&mut self, group_index: usize, value: i128) { - if self.is_overflow(group_index) { - // This means there's a overflow in decimal, so we will just skip the rest - // of the computation - return; - } - - self.is_empty.set_bit(group_index, false); let (new_sum, is_overflow) = self.sums[group_index].overflowing_add(value); self.counts[group_index] += 1; + self.sums[group_index] = new_sum; Review Comment: Same as above. Doesn't it make sense to update the sum only after we know it will not overflow? ########## native/spark-expr/src/agg_funcs/avg_decimal.rs: ########## @@ -341,29 +337,16 @@ impl AvgDecimalGroupsAccumulator { } } - fn is_overflow(&self, index: usize) -> bool { - !self.is_empty.get_bit(index) && !self.is_not_null.get_bit(index) - } - + #[inline] fn update_single(&mut self, group_index: usize, value: i128) { - if self.is_overflow(group_index) { Review Comment: Why do we want to remove this? Once an overflow occurs it will continue to overflow and updating the sum is meaningless, isn't it? ########## native/spark-expr/src/utils.rs: ########## @@ -223,6 +228,20 @@ pub fn is_valid_decimal_precision(value: i128, precision: u8) -> bool { && value <= MAX_DECIMAL128_FOR_EACH_PRECISION[precision as usize] } +/// Build a boolean buffer from the state and reset the state, based on the emit_to +/// strategy. +pub fn build_bool_state(state: &mut BooleanBufferBuilder, emit_to: &EmitTo) -> BooleanBuffer { + let bool_state: BooleanBuffer = state.finish(); + + match emit_to { + EmitTo::All => bool_state, + EmitTo::First(n) => { + state.append_buffer(&bool_state.slice(*n, bool_state.len() - n)); Review Comment: For my knowledge, can you explain why this is better/more correct than the previous version? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org