[
https://issues.apache.org/jira/browse/ORC-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560134#comment-16560134
]
ASF GitHub Bot commented on ORC-203:
------------------------------------
Github user moresandeep commented on a diff in the pull request:
https://github.com/apache/orc/pull/292#discussion_r205867014
--- Diff: java/core/src/java/org/apache/orc/impl/ColumnStatisticsImpl.java
---
@@ -584,16 +642,40 @@ public void merge(ColumnStatisticsImpl other) {
if (str.minimum != null) {
maximum = new Text(str.getMaximum());
minimum = new Text(str.getMinimum());
- } else {
+ }
+ /* str.minimum == null when lower bound set */
+ else if (str.getLowerBound() != null) {
+ minimum = new Text(str.getLowerBound());
+ isLowerBoundSet = true;
+
+ /* check for upper bound before setting max */
+ if (str.getUpperBound() != null) {
+ maximum = new Text(str.getUpperBound());
+ isUpperBoundSet = true;
+ } else {
+ maximum = new Text(str.getMaximum());
+ }
+ }
+ else {
/* both are empty */
maximum = minimum = null;
}
} else if (str.minimum != null) {
if (minimum.compareTo(str.minimum) > 0) {
- minimum = new Text(str.getMinimum());
+ if(str.getLowerBound() != null) {
+ minimum = new Text(str.getLowerBound());
+ isLowerBoundSet = true;
+ } else {
+ minimum = new Text(str.getMinimum());
--- End diff --
We could but there is really does not matter as this will be set just once,
either true or default false given this is an instance. Let me know if you feel
otherwise I can update it.
> Modify the StringStatistics to trim minimum and maximum values
> --------------------------------------------------------------
>
> Key: ORC-203
> URL: https://issues.apache.org/jira/browse/ORC-203
> Project: ORC
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Sandeep More
> Priority: Major
>
> Currently the StringStatistics will record the entire value for minimum or
> maximum. It creates large protobuf objects and serves very little value. I
> think we should trim long strings to 1024 characters and record the fact that
> they were trimmed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)