Matt Jadczak created ARROW-9915:
-----------------------------------

             Summary: [Java] getObject API for temporal types is inconsistent 
and in some cases incorrect
                 Key: ARROW-9915
                 URL: https://issues.apache.org/jira/browse/ARROW-9915
             Project: Apache Arrow
          Issue Type: Bug
          Components: Java
    Affects Versions: 1.0.0, 0.17.1, 0.17.0, 0.16.0, 0.15.1, 0.15.0, 0.14.1, 
0.14.0, 0.13.0
            Reporter: Matt Jadczak


It seems that the work which has been tracked in ARROW-2015 and merged in 
[https://github.com/apache/arrow/pull/2966] to change the return types of the 
various Time and Date vector types when using the getObject API missed some of 
the vector types which are temporal and so should return a temporal type, and 
provided an incorrect implementation for others (some of this was pointed out 
in the initial PR review, but it seems that it slipped through the cracks and 
was not addressed before merging).

Here is a table of the various temporal vector types, what they currently 
return from getObject, and what they should return, in my opinion (I have 
included ones in which the implementation is correct for completeness, and 
coloured them green).

 

 
||Vector class||Current return type||Proposed return type||Comments||
|DateDayVector|Integer|LocalDate|Currently returns the raw value of days since 
epoch, should return the actual date|
|DateMilliVector|LocalDateTime|LocalDate|This type is supposed to encode a 
date, not a datetime, so even though epoch millis are used, the result should 
be a LocalDate|
|{color:#00875a}DurationVector{color}|{color:#00875a}Duration{color}|{color:#00875a}Duration{color}|{color:#00875a}Correct.{color}|
|IntervalDayVector|Duration|Period|As per 
[https://github.com/apache/arrow/blob/master/format/Schema.fbs#L251] , Interval 
should be a calendar-based datatype, not a time-based one. This is represented 
in Java by a Period type. However, I note that the Java Period class does not 
support milliseconds, unlike the Arrow type, which might be why Duration is 
being returned. Some discussion may be needed on the best way to deal with 
this.|
|{color:#00875a}IntervalYearVector{color}|{color:#00875a}Period{color}|{color:#00875a}Period{color}|{color:#00875a}Correct.{color}|
|TimeMicroVector|Long|LocalTime|Currently returns the raw number of micros, 
should return the actual time|
|TimeMilliVector|LocalDateTime|LocalTime|Currently returns a datetime on 
1970-01-01 with the correct time component, should just return the time|
|TimeNanoVector|Long|LocalTime|Currently returns the raw number of nanos, 
should return the actual time|
|TimeSecVector|Integer|LocalTime|Currently returns the raw number of seconds, 
should return the actual time|
|{color:#00875a}TimeStampMicroVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|{color:#00875a}TimeStampMilliVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|{color:#00875a}TimeStampNanoVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|{color:#00875a}TimeStampSecVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|TimeStampMicroTZVector|Long|ZonedDateTime|Currently returns the underlying 
micros, and TZ has to be obtained separately. Should return the actual datetime 
with timezone|
|TimeStampMilliTZVector|Long|ZonedDateTime|Currently returns the underlying 
millis, and TZ has to be obtained separately. Should return the actual datetime 
with timezone|
|TimeStampNanoTZVector|Long|ZonedDateTime|Currently returns the underlying 
nanos, and TZ has to be obtained separately. Should return the actual datetime 
with timezone|
|TimeStampSecTZVector|Long|ZonedDateTime|Currently returns the underlying 
seconds, and TZ has to be obtained separately. Should return the actual 
datetime with timezone|

I am happy to submit a PR to fix this if there is no other reason for this to 
persist other than these types being rarely used, but note that these would all 
be breaking API changes.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to