[
https://issues.apache.org/jira/browse/ORC-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994251#comment-15994251
]
ASF GitHub Bot commented on ORC-185:
------------------------------------
Github user wgtmac commented on a diff in the pull request:
https://github.com/apache/orc/pull/116#discussion_r114469297
--- Diff: c++/src/Statistics.hh ---
@@ -41,49 +41,181 @@ namespace orc {
};
/**
+ * Internal Statistics Implementation
+ */
+
+ template <typename T>
+ class InternalStatisticsImpl {
+ private:
+ bool hasNull_;
+ bool hasMinimum_;
+ bool hasMaximum_;
+ bool hasSum_;
+ bool hasTotalLength_;
+ uint64_t totalLength_;
+ uint64_t valueCount_;
+ T minimum_;
+ T maximum_;
+ T sum_;
+ public:
+ InternalStatisticsImpl() {
+ hasNull_ = false;
+ hasMinimum_ = false;
+ hasMaximum_ = false;
+ hasSum_ = false;
+ hasTotalLength_ = false;
+ totalLength_ = -1;
--- End diff --
If I add a function called void update(std::string str) for
StringColumnStatistics to update string stats and it will see problem. For the
first string, it needs to change totalLength_ to its length. For the following
strings we use addition. This works but the code is not elegant.
Similarly, if I add a function called void increase(uint64_t count), the
same thing happens. I think making default value to 0 is more cleaner in these
cases.
> [C++] Simplify Statististics Implementation
> -------------------------------------------
>
> Key: ORC-185
> URL: https://issues.apache.org/jira/browse/ORC-185
> Project: ORC
> Issue Type: Bug
> Reporter: Deepak Majeti
> Assignee: Deepak Majeti
>
> There is a lot of code duplication in the current ColumnStatistics
> implementation. The scope of this JIRA is to use templates to reuse code as
> much as possible.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)