[ 
https://issues.apache.org/jira/browse/ORC-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yukihiro Okada updated ORC-528:
-------------------------------
    Description: 
I'm trying to understand how do deal properly with timestamps.  I've created a 
CSV file with some crucial timestamps (at least I believe these are):
{code:java}
2019-01-01 00:00:00.0000
 2015-01-01 00:00:00.0001
 2015-01-01 00:00:00.0000
 2014-12-31 23:59:59.9999
 1970-01-01 00:00:00.0001
 1970-01-01 00:00:00.0000
 1969-12-31 23:59:59.9999
 1969-12-31 23:59:59.0001
 1969-12-31 23:59:59.0000
 1969-12-31 23:59:58.9999
{code}
I've created an ORC file using hive-1.1.0-cdh5.14.2.  Hive is able to read this 
file back correctly.  All timestamps seem to match.  Reading the same file 
using orc-tools shows different results:

 
{code:java}
{{
{"_col0":"2019-01-01 00:00:00.0"}
}}
 {{
{"_col0":"2015-01-01 00:00:00.0001"}
}}
 {{
{"_col0":"2015-01-01 00:00:00.0"}
}}
 {{
{"_col0":"2014-12-31 23:59:59.9999"}
}}
 {{
{"_col0":"1970-01-01 00:00:00.0001"}
}}
 {{
{"_col0":"1970-01-01 00:00:00.0"}
}}
 {{
{"_col0":"1969-12-31 23:59:58.9999"}
}}
 {{
{"_col0":"1969-12-31 23:59:59.0001"}
}}
 {{
{"_col0":"1969-12-31 23:59:59.0"}
}}
 {{
{"_col0":"1969-12-31 23:59:57.9999"}
}}
{code}
 
The actual result/difference here being the last and 4th from last row, which 
are one second off.

With some modifications I managed to have orc-tools generate a file itself with 
timestamps using convert (see ORC-526), which, when I read this one back in 
hive-1.1.0-cdh5.14.2 results in:
{code:java}
2019-01-01 00:00:00
 2015-01-01 00:00:00.0001
 2015-01-01 00:00:00
 2014-12-31 23:59:59.9999
 1970-01-01 00:00:00.0001
 1970-01-01 00:00:00
 1970-01-01 00:00:00.9999
 1969-12-31 23:59:59.0001
 1969-12-31 23:59:59
 1969-12-31 23:59:59.9999{code}
which is also wrong: 4th row from bottom and on the last row by one second, but 
this time in the other direction.  When I read the file with orc-tools itself, 
it shows correct output (58) for the last row, but incorrect ouput for the 4th 
from bottom.  I noticed orc-tools-1.2.0 cannot read the file from 1.6.0.  1.3.4 
can, which also results in the incorrect output.

{{orc-tools-1.6.0:}}
{code:java}
{{
{"mytime":"2019-01-01 00:00:00.0"}
}}
 {{
{"mytime":"2015-01-01 00:00:00.0001"}
}}
 {{
{"mytime":"2015-01-01 00:00:00.0"}
}}
 {{
{"mytime":"2014-12-31 23:59:59.9999"}
}}
 {{
{"mytime":"1970-01-01 00:00:00.0001"}
}}
 {{
{"mytime":"1970-01-01 00:00:00.0"}
}}
 {{
{"mytime":"1970-01-01 00:00:00.9999"}
}}
 {{
{"mytime":"1969-12-31 23:59:59.0001"}
}}
 {{
{"mytime":"1969-12-31 23:59:59.0"}
}}
 {{
{"mytime":"1969-12-31 23:59:58.9999"}
}}
{code}
 

{{orc-tools-1.3.4:}}
{code:java}
{{
{"mytime":"2019-01-01 00:00:00.0"}
}}
 {{
{"mytime":"2015-01-01 00:00:00.0001"}
}}
 {{
{"mytime":"2015-01-01 00:00:00.0"}
}}
 {{
{"mytime":"2014-12-31 23:59:59.9999"}
}}
 {{
{"mytime":"1970-01-01 00:00:00.0001"}
}}
 {{
{"mytime":"1970-01-01 00:00:00.0"}
}}
 {{
{"mytime":"1970-01-01 00:00:00.9999"}
}}
 {{
{"mytime":"1969-12-31 23:59:58.0001"}
}}
 {{
{"mytime":"1969-12-31 23:59:59.0"}
}}
 {{
{"mytime":"1969-12-31 23:59:58.9999"}
}}
{code}
 

I'm getting a bit lost at what's right and wrong, but I'm getting the feeling 
something doesn't add up here.

  was:
I'm trying to understand how do deal properly with timestamps.  I've created a 
CSV file with some crucial timestamps (at least I believe these are):
{code:java}
2019-01-01 00:00:00.0000
 2015-01-01 00:00:00.0001
 2015-01-01 00:00:00.0000
 2014-12-31 23:59:59.9999
 1970-01-01 00:00:00.0001
 1970-01-01 00:00:00.0000
 1969-12-31 23:59:59.9999
 1969-12-31 23:59:59.0001
 1969-12-31 23:59:59.0000
 1969-12-31 23:59:58.9999{code}
I've created an ORC file using hive-1.1.0-cdh5.14.2.  Hive is able to read this 
file back correctly.  All timestamps seem to match.  Reading the same file 
using orc-tools shows different results:

{{

{"_col0":"2019-01-01 00:00:00.0"}

}}
 {{

{"_col0":"2015-01-01 00:00:00.0001"}

}}
 {{

{"_col0":"2015-01-01 00:00:00.0"}

}}
 {{

{"_col0":"2014-12-31 23:59:59.9999"}

}}
 {{

{"_col0":"1970-01-01 00:00:00.0001"}

}}
 {{

{"_col0":"1970-01-01 00:00:00.0"}

}}
 {{

{"_col0":"1969-12-31 23:59:58.9999"}

}}
 {{

{"_col0":"1969-12-31 23:59:59.0001"}

}}
 {{

{"_col0":"1969-12-31 23:59:59.0"}

}}
 {{

{"_col0":"1969-12-31 23:59:57.9999"}

}}

The actual result/difference here being the last and 4th from last row, which 
are one second off.

With some modifications I managed to have orc-tools generate a file itself with 
timestamps using convert (see ORC-526), which, when I read this one back in 
hive-1.1.0-cdh5.14.2 results in:

{{2019-01-01 00:00:00}}
 {{2015-01-01 00:00:00.0001}}
 {{2015-01-01 00:00:00}}
 {{2014-12-31 23:59:59.9999}}
 {{1970-01-01 00:00:00.0001}}
 {{1970-01-01 00:00:00}}
 {{1970-01-01 00:00:00.9999}}
 {{1969-12-31 23:59:59.0001}}
 {{1969-12-31 23:59:59}}
 {{1969-12-31 23:59:59.9999}}

which is also wrong: 4th row from bottom and on the last row by one second, but 
this time in the other direction.  When I read the file with orc-tools itself, 
it shows correct output (58) for the last row, but incorrect ouput for the 4th 
from bottom.  I noticed orc-tools-1.2.0 cannot read the file from 1.6.0.  1.3.4 
can, which also results in the incorrect output.

{{orc-tools-1.6.0:}}
 {{

{"mytime":"2019-01-01 00:00:00.0"}

}}
 {{

{"mytime":"2015-01-01 00:00:00.0001"}

}}
 {{

{"mytime":"2015-01-01 00:00:00.0"}

}}
 {{

{"mytime":"2014-12-31 23:59:59.9999"}

}}
 {{

{"mytime":"1970-01-01 00:00:00.0001"}

}}
 {{

{"mytime":"1970-01-01 00:00:00.0"}

}}
 {{

{"mytime":"1970-01-01 00:00:00.9999"}

}}
 {{

{"mytime":"1969-12-31 23:59:59.0001"}

}}
 {{

{"mytime":"1969-12-31 23:59:59.0"}

}}
 {{

{"mytime":"1969-12-31 23:59:58.9999"}

}}

{{orc-tools-1.3.4:}}
 {{

{"mytime":"2019-01-01 00:00:00.0"}

}}
 {{

{"mytime":"2015-01-01 00:00:00.0001"}

}}
 {{

{"mytime":"2015-01-01 00:00:00.0"}

}}
 {{

{"mytime":"2014-12-31 23:59:59.9999"}

}}
 {{

{"mytime":"1970-01-01 00:00:00.0001"}

}}
 {{

{"mytime":"1970-01-01 00:00:00.0"}

}}
 {{

{"mytime":"1970-01-01 00:00:00.9999"}

}}
 {{

{"mytime":"1969-12-31 23:59:58.0001"}

}}
 {{

{"mytime":"1969-12-31 23:59:59.0"}

}}
 {{

{"mytime":"1969-12-31 23:59:58.9999"}

}}

I'm getting a bit lost at what's right and wrong, but I'm getting the feeling 
something doesn't add up here.


> orc-tools timestamps off by one?
> --------------------------------
>
>                 Key: ORC-528
>                 URL: https://issues.apache.org/jira/browse/ORC-528
>             Project: ORC
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 1.5.5, 1.6.0
>            Reporter: Fabian Groffen
>            Priority: Minor
>
> I'm trying to understand how do deal properly with timestamps.  I've created 
> a CSV file with some crucial timestamps (at least I believe these are):
> {code:java}
> 2019-01-01 00:00:00.0000
>  2015-01-01 00:00:00.0001
>  2015-01-01 00:00:00.0000
>  2014-12-31 23:59:59.9999
>  1970-01-01 00:00:00.0001
>  1970-01-01 00:00:00.0000
>  1969-12-31 23:59:59.9999
>  1969-12-31 23:59:59.0001
>  1969-12-31 23:59:59.0000
>  1969-12-31 23:59:58.9999
> {code}
> I've created an ORC file using hive-1.1.0-cdh5.14.2.  Hive is able to read 
> this file back correctly.  All timestamps seem to match.  Reading the same 
> file using orc-tools shows different results:
>  
> {code:java}
> {{
> {"_col0":"2019-01-01 00:00:00.0"}
> }}
>  {{
> {"_col0":"2015-01-01 00:00:00.0001"}
> }}
>  {{
> {"_col0":"2015-01-01 00:00:00.0"}
> }}
>  {{
> {"_col0":"2014-12-31 23:59:59.9999"}
> }}
>  {{
> {"_col0":"1970-01-01 00:00:00.0001"}
> }}
>  {{
> {"_col0":"1970-01-01 00:00:00.0"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:58.9999"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:59.0001"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:59.0"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:57.9999"}
> }}
> {code}
>  
> The actual result/difference here being the last and 4th from last row, which 
> are one second off.
> With some modifications I managed to have orc-tools generate a file itself 
> with timestamps using convert (see ORC-526), which, when I read this one back 
> in hive-1.1.0-cdh5.14.2 results in:
> {code:java}
> 2019-01-01 00:00:00
>  2015-01-01 00:00:00.0001
>  2015-01-01 00:00:00
>  2014-12-31 23:59:59.9999
>  1970-01-01 00:00:00.0001
>  1970-01-01 00:00:00
>  1970-01-01 00:00:00.9999
>  1969-12-31 23:59:59.0001
>  1969-12-31 23:59:59
>  1969-12-31 23:59:59.9999{code}
> which is also wrong: 4th row from bottom and on the last row by one second, 
> but this time in the other direction.  When I read the file with orc-tools 
> itself, it shows correct output (58) for the last row, but incorrect ouput 
> for the 4th from bottom.  I noticed orc-tools-1.2.0 cannot read the file from 
> 1.6.0.  1.3.4 can, which also results in the incorrect output.
> {{orc-tools-1.6.0:}}
> {code:java}
> {{
> {"mytime":"2019-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2014-12-31 23:59:59.9999"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.9999"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:59.0001"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:59.0"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:58.9999"}
> }}
> {code}
>  
> {{orc-tools-1.3.4:}}
> {code:java}
> {{
> {"mytime":"2019-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2014-12-31 23:59:59.9999"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.9999"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:58.0001"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:59.0"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:58.9999"}
> }}
> {code}
>  
> I'm getting a bit lost at what's right and wrong, but I'm getting the feeling 
> something doesn't add up here.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to