[ 
https://issues.apache.org/jira/browse/ORC-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved ORC-1151.
--------------------------------
    Fix Version/s: 1.7.5
       Resolution: Fixed

This is resolved via https://github.com/apache/orc/pull/1088

> [C++] Incorrect statistics for Timestamp column with non UTC writer time zones
> ------------------------------------------------------------------------------
>
>                 Key: ORC-1151
>                 URL: https://issues.apache.org/jira/browse/ORC-1151
>             Project: ORC
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 1.8.0, 1.7.4
>            Reporter: noirello
>            Assignee: noirello
>            Priority: Major
>             Fix For: 1.7.5
>
>
> When the writer time zone is not UTC, then the statistics for timestamp type 
> is incorrect.
> Minimal example to reproduce:
> {code:java}
> #include "orc/OrcFile.hh"
> int main() {
>         std::unique_ptr<orc::Type> 
> type(orc::Type::buildTypeFromString("struct<x:int,y:timestamp>"));
>         std::unique_ptr<orc::OutputStream> outStream = 
> orc::writeLocalFile("./test.orc");
>         orc::WriterOptions options;
>         options.setTimezoneName("Asia/Shanghai");
>         std::unique_ptr<orc::Writer> writer = createWriter(*type, 
> outStream.get(), options);
>         std::unique_ptr<orc::ColumnVectorBatch> batch = 
> writer->createRowBatch(1);
>         orc::StructVectorBatch *root = dynamic_cast<orc::StructVectorBatch 
> *>(batch.get());
>         orc::LongVectorBatch *x = dynamic_cast<orc::LongVectorBatch 
> *>(root->fields[0]);
>         orc::TimestampVectorBatch *y = dynamic_cast<orc::TimestampVectorBatch 
> *>(root->fields[1]);
>         x->data[0] = 1;
>         y->data[0] = 1650133963;  // 2022-04-16T18:32:43.3210+00:00
>         y->nanoseconds[0] = 321000000;
>         x->numElements = 1;
>         y->numElements = 1;
>         root->numElements = 1;
>         writer->add(*batch);
>         writer->close();
>         return 0;
> } {code}
> Statistics:
> {code:java}
> # bin/orc-statistics test.orc
> File test.orc has 3 columns
> *** Column 0 ***
> Column has 1 values and has null value: no
> *** Column 1 ***
> Data type: Integer
> Values: 1
> Has null: no
> Minimum: 1
> Maximum: 1
> Sum: 1*** Column 2 ***
> Data type: Timestamp
> Values: 1
> Has null: no
> Minimum: 2022-04-16 18:33:12.121
> LowerBound: 2022-04-16 18:33:12.121
> Maximum: 2022-04-16 18:33:12.121
> UpperBound: 2022-04-16 18:33:12.122
> File test.orc has 1 stripes
> *** Stripe 0 ***
> --- Column 0 ---
> Column has 1 values and has null value: no
> --- Column 1 ---
> Data type: Integer
> Values: 1
> Has null: no
> Minimum: 1
> Maximum: 1
> Sum: 1
> --- Column 2 ---
> Data type: Timestamp
> Values: 1
> Has null: no
> Minimum: 2022-04-16 18:33:12.121
> LowerBound: 2022-04-16 18:33:12.121
> Maximum: 2022-04-16 18:33:12.121
> UpperBound: 2022-04-16 18:33:12.122{code}
> Content:
> {code:java}
> # bin/orc-contents test.orc
> {"x": 1, "y": "2022-04-17 02:32:43.321"}{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to