Github user wgtmac commented on the issue:
https://github.com/apache/orc/pull/120
Yes, our current ColumnWriter design follows the approach of ColumnReader.
One base class ColumnWriter takes care of all common work, and specific
inherited ColumnWriters like IntegerColumnWriter, StructColumnWriter,
BooleanColumnWriters, etc. manage their own logic. To integrate
ColumnStatistics, we have two options (correct me if I'm wrong):
1. Put a based ColumnStatistics class in the base ColumnWriter like I said
previously, and in each TypeColumnWriter, use dynamic type of ColumnStatistics
to do its type-specific logic. In this way, we can save many duplicate code by
polymorphism in the base ColumnWriter class; (our current code is written in
this fashion)
2. There is no ColumnStatistic variable in the base class ColumnWriter.
Instead, each inherited TypeColumnWriter class uses one
TypeColumnStatisticsImpl (not TypeColumnStatistics), e.g. IntegerColumnWriter
uses IntegerColumnStatisticsImpl. This is feasible except for a little bit more
duplicate stuff compared with the 1st approach.
Your InternalColumnStatisticsImpl design is more inclined to the 2nd
approach as it is still not a base class to use in the base class ColumnWriter.
We can discuss ColumnWriter later as it is difficult without actual code. To
save both of our time and effort and avoid significant change of our current
ColumnWriter implementation, I will update this PR accordingly based on the
approach 2 after your PR is merged. @majetideepak
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---