zhuqi-lucas commented on PR #22706: URL: https://github.com/apache/datafusion/pull/22706#issuecomment-4602892722
Both make sense, agree on both. **EPIC for remaining type coverage** Opened #22715 as the broader tracking EPIC. It includes the existing #22682 (nested types, in flight via this PR sequence) plus the 6 additional primitives you called out (`FixedSizeBinary`, `Float16`, `Duration`, `Interval`, `Decimal256`, `Dictionary`), ordered easiest-first. `Dictionary` is flagged "needs discussion before implementation" because of the encoded-vs-decoded key trade-off. Each remaining-primitive item blocks on PR 1 of this PR sequence (the factory + recursive `supported_type`); after that they're independently mergeable. **dhat memory regression tests** Good call. The `size()`-based assertions I have today report what each builder self-claims, which is honest for column-native (each `Vec` / null buffer is accounted) but a weak signal for "did this PR actually move real heap." Plan: in the refactor / setup PR (PR 1 of the split), add a small dhat-based harness gated behind a `dhat-heap` feature so the default test run does not pay the profiler cost. The harness wraps `GroupValuesColumn::intern` on a representative dataset in `dhat::Profiler::builder().testing().build()` and asserts `dhat::HeapStats::get().max_bytes < N`. Threshold is set generously at first; each subsequent type-support PR lowers it to reflect the actual win from that PR landing. Same harness can be reused by the EPIC items in #22715 as they land. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
