tustvold commented on code in PR #4736:
URL: https://github.com/apache/arrow-rs/pull/4736#discussion_r1305892558
##########
arrow-row/src/lib.rs:
##########
@@ -1445,6 +1446,70 @@ unsafe fn decode_column(
Ok(array)
}
+macro_rules! downcast_dict {
+ ($array:ident, $key:ident) => {{
+ $array
+ .as_any()
+ .downcast_ref::<DictionaryArray<$key>>()
+ .unwrap()
+ }};
+}
+
+const LOW_CARDINALITY_THRESHOLD: usize = 10;
+
+#[derive(Debug)]
+pub struct CardinalityAwareRowConverter {
+ inner: RowConverter,
+ done: bool,
+}
+
+impl CardinalityAwareRowConverter {
+ pub fn new(fields: Vec<SortField>) -> Result<Self, ArrowError> {
+ Ok(Self {
+ inner: RowConverter::new(fields)?,
+ done: false,
+ })
+ }
+
+ pub fn size(&self) -> usize {
+ self.inner.size()
+ }
+
+ pub fn convert_rows(&self, rows: &Rows) -> Result<Vec<ArrayRef>,
ArrowError> {
+ self.inner.convert_rows(rows)
+ }
+
+ pub fn convert_columns(
+ &mut self,
+ columns: &[ArrayRef]) -> Result<Rows, ArrowError> {
+ if !self.done {
+ for (i, col) in columns.iter().enumerate() {
+ if let DataType::Dictionary(k, _) = col.data_type() {
+ // let cardinality =
col.as_any().downcast_ref::<DictionaryArray<Int32Type>>().unwrap().values().len();
+ let cardinality = match k.as_ref() {
Review Comment:
Given the RowConverter blindly generates a mapping for all values,
regardless of if they appear in the keys, I think we should just use the length
of the values. Whilst an argument could be made for doing something more
sophisticated, this would only really make sense if the dictionary interner
itself followed a similar approach
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]