Matt Jadczak created ARROW-9915:
-----------------------------------
Summary: [Java] getObject API for temporal types is inconsistent
and in some cases incorrect
Key: ARROW-9915
URL: https://issues.apache.org/jira/browse/ARROW-9915
Project: Apache Arrow
Issue Type: Bug
Components: Java
Affects Versions: 1.0.0, 0.17.1, 0.17.0, 0.16.0, 0.15.1, 0.15.0, 0.14.1,
0.14.0, 0.13.0
Reporter: Matt Jadczak
It seems that the work which has been tracked in ARROW-2015 and merged in
[https://github.com/apache/arrow/pull/2966] to change the return types of the
various Time and Date vector types when using the getObject API missed some of
the vector types which are temporal and so should return a temporal type, and
provided an incorrect implementation for others (some of this was pointed out
in the initial PR review, but it seems that it slipped through the cracks and
was not addressed before merging).
Here is a table of the various temporal vector types, what they currently
return from getObject, and what they should return, in my opinion (I have
included ones in which the implementation is correct for completeness, and
coloured them green).
||Vector class||Current return type||Proposed return type||Comments||
|DateDayVector|Integer|LocalDate|Currently returns the raw value of days since
epoch, should return the actual date|
|DateMilliVector|LocalDateTime|LocalDate|This type is supposed to encode a
date, not a datetime, so even though epoch millis are used, the result should
be a LocalDate|
|{color:#00875a}DurationVector{color}|{color:#00875a}Duration{color}|{color:#00875a}Duration{color}|{color:#00875a}Correct.{color}|
|IntervalDayVector|Duration|Period|As per
[https://github.com/apache/arrow/blob/master/format/Schema.fbs#L251] , Interval
should be a calendar-based datatype, not a time-based one. This is represented
in Java by a Period type. However, I note that the Java Period class does not
support milliseconds, unlike the Arrow type, which might be why Duration is
being returned. Some discussion may be needed on the best way to deal with
this.|
|{color:#00875a}IntervalYearVector{color}|{color:#00875a}Period{color}|{color:#00875a}Period{color}|{color:#00875a}Correct.{color}|
|TimeMicroVector|Long|LocalTime|Currently returns the raw number of micros,
should return the actual time|
|TimeMilliVector|LocalDateTime|LocalTime|Currently returns a datetime on
1970-01-01 with the correct time component, should just return the time|
|TimeNanoVector|Long|LocalTime|Currently returns the raw number of nanos,
should return the actual time|
|TimeSecVector|Integer|LocalTime|Currently returns the raw number of seconds,
should return the actual time|
|{color:#00875a}TimeStampMicroVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|{color:#00875a}TimeStampMilliVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|{color:#00875a}TimeStampNanoVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|{color:#00875a}TimeStampSecVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
|TimeStampMicroTZVector|Long|ZonedDateTime|Currently returns the underlying
micros, and TZ has to be obtained separately. Should return the actual datetime
with timezone|
|TimeStampMilliTZVector|Long|ZonedDateTime|Currently returns the underlying
millis, and TZ has to be obtained separately. Should return the actual datetime
with timezone|
|TimeStampNanoTZVector|Long|ZonedDateTime|Currently returns the underlying
nanos, and TZ has to be obtained separately. Should return the actual datetime
with timezone|
|TimeStampSecTZVector|Long|ZonedDateTime|Currently returns the underlying
seconds, and TZ has to be obtained separately. Should return the actual
datetime with timezone|
I am happy to submit a PR to fix this if there is no other reason for this to
persist other than these types being rarely used, but note that these would all
be breaking API changes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)