[
https://issues.apache.org/jira/browse/ORC-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993986#comment-15993986
]
ASF GitHub Bot commented on ORC-185:
------------------------------------
Github user wgtmac commented on a diff in the pull request:
https://github.com/apache/orc/pull/116#discussion_r114446720
--- Diff: c++/src/Statistics.hh ---
@@ -41,49 +41,181 @@ namespace orc {
};
/**
+ * Internal Statistics Implementation
+ */
+
+ template <typename T>
--- End diff --
We may need some functions like void increase(uint64 count) to increase
valueCount. I can add them when needed.
My main concern for using templates is that we need to compare, update,
merge ColumnStatistics, and transform to protobuf version for implementing
writers and using templates will also introduce some duplicate code. It means
we still need to do template specialization for different types like Date,
Timestamp, Decimal, etc. if we want to let class ColumnStatistics to handle the
update (e.g. use ColumnStatistics<T>::update(T value) to update min/max for
type T). Otherwise we may need to let specific ColumnWriters to be responsible
for update (e.g. DecimalColumnWriter to compare min/max of decimal values and
then use setMax/setMin of ColumnStatistics<Decimal> to update the values).
> [C++] Simplify Statististics Implementation
> -------------------------------------------
>
> Key: ORC-185
> URL: https://issues.apache.org/jira/browse/ORC-185
> Project: ORC
> Issue Type: Bug
> Reporter: Deepak Majeti
> Assignee: Deepak Majeti
>
> There is a lot of code duplication in the current ColumnStatistics
> implementation. The scope of this JIRA is to use templates to reuse code as
> much as possible.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)