[
https://issues.apache.org/jira/browse/ORC-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917753#comment-16917753
]
Yukihiro Okada commented on ORC-208:
------------------------------------
I think this issue is already fixed.
I confirmed it with orc version 1.5.6.
This is an evidence.
{code:java}
% cat some.json
{"myid":1,"time":"2019-10-28 07:34:07"}
{"myid":2,"time":"2019-10-29 07:20:57"}
% rm -f output.orc && orc-tools convert --schema
"struct<myid:int,time:timestamp>" some.json
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
Processing some.json
% orc-tools version
ORC 1.5.6
% orc-tools meta output.orc
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
Processing data file output.orc [length: 331]
Structure for output.orc
File Version: 0.12 with ORC_517
Rows: 2
Compression: ZLIB
Compression size: 262144
Type: struct<myid:int,time:timestamp>Stripe Statistics:
Stripe 1:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28
07:34:07.0 max: 2019-10-29 07:20:57.0File Statistics:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28 07:34:07.0
max: 2019-10-29 07:20:57.0Stripes:
Stripe: offset: 3 data: 25 rows: 2 tail: 58 index: 69
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 24
Stream: column 2 section ROW_INDEX start: 38 length 34
Stream: column 1 section DATA start: 72 length 6
Stream: column 2 section DATA start: 78 length 13
Stream: column 2 section SECONDARY start: 91 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
Encoding column 2: DIRECT_V2File length: 331 bytes
Padding length: 0 bytes
Padding ratio: 0%
________________________________________________________________________________________________________________________%
orc-tools data output.orc
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
Processing data file output.orc [length: 331]
{"myid":1,"time":"2019-10-28 07:34:07.0"}
{"myid":2,"time":"2019-10-29 07:20:57.0"}
________________________________________________________________________________________________________________________%
echo $TZ
US/Pacific-- set another timezone
% export TZ=Asia/Tokyo
% orc-tools meta output.orc
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
Processing data file output.orc [length: 331]
Structure for output.orc
File Version: 0.12 with ORC_517
Rows: 2
Compression: ZLIB
Compression size: 262144
Type: struct<myid:int,time:timestamp>Stripe Statistics:
Stripe 1:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28
07:34:07.0 max: 2019-10-29 07:20:57.0File Statistics:
Column 0: count: 2 hasNull: false
Column 1: count: 2 hasNull: false bytesOnDisk: 6 min: 1 max: 2 sum: 3
Column 2: count: 2 hasNull: false bytesOnDisk: 19 min: 2019-10-28 07:34:07.0
max: 2019-10-29 07:20:57.0Stripes:
Stripe: offset: 3 data: 25 rows: 2 tail: 58 index: 69
Stream: column 0 section ROW_INDEX start: 3 length 11
Stream: column 1 section ROW_INDEX start: 14 length 24
Stream: column 2 section ROW_INDEX start: 38 length 34
Stream: column 1 section DATA start: 72 length 6
Stream: column 2 section DATA start: 78 length 13
Stream: column 2 section SECONDARY start: 91 length 6
Encoding column 0: DIRECT
Encoding column 1: DIRECT_V2
Encoding column 2: DIRECT_V2File length: 331 bytes
Padding length: 0 bytes
Padding ratio: 0%
________________________________________________________________________________________________________________________
% orc-tools data output.orc
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
Processing data file output.orc [length: 331]
{"myid":1,"time":"2019-10-28 07:34:07.0"}
{"myid":2,"time":"2019-10-29 07:20:57.0"}
________________________________________________________________________________________________________________________
{code}
> ORC Tools dump and meta output with timezone
> --------------------------------------------
>
> Key: ORC-208
> URL: https://issues.apache.org/jira/browse/ORC-208
> Project: ORC
> Issue Type: Bug
> Reporter: Charles Pritchard
> Priority: Major
>
> Currently the ORC dump and meta output for a file created in UTC but read in
> America/Los_Angeles will result in two different printouts; meta shows in
> current timezone, dump shows in original (file time zone). This may be
> confusing (and it is!).
> Dump:
> {code}
> "_col1": "2017-07-01 10:15:32.67",
> {code}
> Meta (statistics for file and stripe):
> {code}
> "min": "2017-07-01 03:15:32.67",
> "max": "2017-07-01 03:15:32.67",
> {code}
> Seems they ought to include the timezone offset to avoid the ambiguity.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)