[jira] [Commented] (ORC-208) ORC Tools dump and meta output with timezone

Yukihiro Okada (Jira) Wed, 28 Aug 2019 05:59:04 -0700


    [ 
https://issues.apache.org/jira/browse/ORC-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917753#comment-16917753
 ]


Yukihiro Okada commented on ORC-208:
------------------------------------

I think this issue is already fixed.

I confirmed it with orc version 1.5.6.

 

This is an evidence.
{code:java}
% cat some.json
{"myid":1,"time":"2019-10-28 07:34:07"}
{"myid":2,"time":"2019-10-29 07:20:57"}
% rm -f output.orc && orc-tools convert --schema 
"struct<myid:int,time:timestamp>" some.json
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Processing some.json
% orc-tools version
ORC 1.5.6
% orc-tools meta output.orc
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Processing data file output.orc [length: 331]
Structure for output.orc
File Version: 0.12 with ORC_517
Rows: 2
Compression: ZLIB
Compression size: 262144
Type: struct<myid:int,time:timestamp>Stripe Statistics:
  Stripe 1:
    Column 0: count: 2 hasNull: false
    Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
    Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28 
07:34:07.0 max: 2019-10-29 07:20:57.0File Statistics:
  Column 0: count: 2 hasNull: false
  Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
  Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28 07:34:07.0 
max: 2019-10-29 07:20:57.0Stripes:
  Stripe: offset: 3 data: 25 rows: 2 tail: 58 index: 69
    Stream: column 0 section ROW_INDEX start: 3 length 11
    Stream: column 1 section ROW_INDEX start: 14 length 24
    Stream: column 2 section ROW_INDEX start: 38 length 34
    Stream: column 1 section DATA start: 72 length 6
    Stream: column 2 section DATA start: 78 length 13
    Stream: column 2 section SECONDARY start: 91 length 6
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2File length: 331 bytes
Padding length: 0 bytes
Padding ratio: 0%
________________________________________________________________________________________________________________________%
 orc-tools data output.orc
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Processing data file output.orc [length: 331]
{"myid":1,"time":"2019-10-28 07:34:07.0"}
{"myid":2,"time":"2019-10-29 07:20:57.0"}
________________________________________________________________________________________________________________________%
 echo $TZ
US/Pacific-- set another timezone
% export TZ=Asia/Tokyo
% orc-tools meta output.orc
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Processing data file output.orc [length: 331]
Structure for output.orc
File Version: 0.12 with ORC_517
Rows: 2
Compression: ZLIB
Compression size: 262144
Type: struct<myid:int,time:timestamp>Stripe Statistics:
  Stripe 1:
    Column 0: count: 2 hasNull: false
    Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
    Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28 
07:34:07.0 max: 2019-10-29 07:20:57.0File Statistics:
  Column 0: count: 2 hasNull: false
  Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
  Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28 07:34:07.0 
max: 2019-10-29 07:20:57.0Stripes:
  Stripe: offset: 3 data: 25 rows: 2 tail: 58 index: 69
    Stream: column 0 section ROW_INDEX start: 3 length 11
    Stream: column 1 section ROW_INDEX start: 14 length 24
    Stream: column 2 section ROW_INDEX start: 38 length 34
    Stream: column 1 section DATA start: 72 length 6
    Stream: column 2 section DATA start: 78 length 13
    Stream: column 2 section SECONDARY start: 91 length 6
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2File length: 331 bytes
Padding length: 0 bytes
Padding ratio: 0%
________________________________________________________________________________________________________________________
% orc-tools data output.orc
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Processing data file output.orc [length: 331]
{"myid":1,"time":"2019-10-28 07:34:07.0"}
{"myid":2,"time":"2019-10-29 07:20:57.0"}
________________________________________________________________________________________________________________________
{code}

> ORC Tools dump and meta output with timezone
> --------------------------------------------
>
>                 Key: ORC-208
>                 URL: https://issues.apache.org/jira/browse/ORC-208
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Charles Pritchard
>            Priority: Major
>
> Currently the ORC dump and meta output for a file created in UTC but read in 
> America/Los_Angeles will result in two different printouts; meta shows in 
> current timezone, dump shows in original (file time zone). This may be 
> confusing (and it is!).
> Dump:
> {code}
>   "_col1": "2017-07-01 10:15:32.67",
> {code}
> Meta (statistics for file and stripe):
> {code}
>           "min": "2017-07-01 03:15:32.67",
>           "max": "2017-07-01 03:15:32.67",
> {code}
> Seems they ought to include the timezone offset to avoid the ambiguity.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (ORC-208) ORC Tools dump and meta output with timezone

Reply via email to