[
https://issues.apache.org/jira/browse/DRILL-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers updated DRILL-5562:
-------------------------------
Description:
Drill provides three interval types, described in `ValueVectorTypes.tdd`:
* {{IntervalYear}}: a duration in months (sic)
* {{IntervalDay}}: a duration in days and ms.
* {{Interval}}: a duration in months, days and ms.
The file defines the width of each "field" (ms, days, months) as an int: 4
bytes. But, the total vector width is wrong:
* {{IntervalYear}}: 8 bytes (should be 4: for months)
* {{IntervalDay}}: 12 bytes (should be 8: for days and ms.)
* {{Interval}}: 16 bytes (should be 12: for months, days and ms.)
It could be that the extra 4 bytes is supposed to be for a time zone. But, time
zones don't apply to intervals: an hour is the same duration everywhere on
earth.
Since an interval does not contain a point in time, a time-zone is not useful
even for daylight savings time adjustments.
The code for each type reflects the "missing" 4 bytes. For example, for the
12-byte {{IntervalDay}} vector:
{code}
public void set(int index, int days, int milliseconds) {
final int offsetIndex = index * VALUE_WIDTH;
data.setInt(offsetIndex, days);
data.setInt((offsetIndex + 4), milliseconds);
}
{code}
Note also that the Drill IntervalDay need not be two fields wide. Except on a
leap second, a day has a fixed number of milliseconds. And, the only way to
compensate for a leap second is to know a point in time, which the interval
does not have. Even if measured across a leap second, an interval of a minute
is always 60 seconds. It is only when doing:
{code}
end date/time = start date/time + interval
{code}
is the leap second even needed.
Although the ISO format expresses intervals as a tuple of (year, month, day,
hour, minute, second), the same value can be expressed as (months, ms) (with
the proper conversions), so Drill's interval types need only be 4 and 8 bytes
wide.
was:
Drill provides three interval types, described in `ValueVectorTypes.tdd`:
* `IntervalYear`: a duration in months (sic)
* `IntervalDay`: a duration in days and ms.
* `Interval`: a duration in months, days and ms.
The file defines the width of each "field" (ms, days, months) as an int: 4
bytes. But, the total vector width is wrong:
* `IntervalYear`: 8 bytes (should be 4: for months)
* `IntervalDay`: 12 bytes (should be 8: for days and ms.)
* `Interval`: 16 bytes (should be 12: for months, days and ms.)
It could be that the extra 4 bytes is supposed to be for a time zone. But, time
zones don't apply to intervals: an hour is the same duration everywhere on
earth.
Since an interval does not contain a point in time, a time-zone is not useful
even for daylight savings time adjustments.
The code for each type reflects the "missing" 4 bytes. For example, for the
12-byte `IntervalDay` vector:
{code}
public void set(int index, int days, int milliseconds) {
final int offsetIndex = index * VALUE_WIDTH;
data.setInt(offsetIndex, days);
data.setInt((offsetIndex + 4), milliseconds);
}
{code}
Note also that the Drill IntervalDay need not be two fields wide. Except on a
leap second, a day has a fixed number of milliseconds. And, the only way to
compensate for a leap second is to know a point in time, which the interval
does not have. Even if measured across a leap second, an interval of a minute
is always 60 seconds. It is only when doing:
{code}
end date/time = start date/time + interval
{code}
is the leap second even needed.
Although the ISO format expresses intervals as a tuple of (year, month, day,
hour, minute, second), the same value can be expressed as (months, ms) (with
the proper conversions), so Drill's interval types need only be 4 and 8 bytes
wide.
> Vector types IntervalYear, IntervalDay and Interval are of the wrong width
> --------------------------------------------------------------------------
>
> Key: DRILL-5562
> URL: https://issues.apache.org/jira/browse/DRILL-5562
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
>
> Drill provides three interval types, described in `ValueVectorTypes.tdd`:
> * {{IntervalYear}}: a duration in months (sic)
> * {{IntervalDay}}: a duration in days and ms.
> * {{Interval}}: a duration in months, days and ms.
> The file defines the width of each "field" (ms, days, months) as an int: 4
> bytes. But, the total vector width is wrong:
> * {{IntervalYear}}: 8 bytes (should be 4: for months)
> * {{IntervalDay}}: 12 bytes (should be 8: for days and ms.)
> * {{Interval}}: 16 bytes (should be 12: for months, days and ms.)
> It could be that the extra 4 bytes is supposed to be for a time zone. But,
> time zones don't apply to intervals: an hour is the same duration everywhere
> on earth.
> Since an interval does not contain a point in time, a time-zone is not useful
> even for daylight savings time adjustments.
> The code for each type reflects the "missing" 4 bytes. For example, for the
> 12-byte {{IntervalDay}} vector:
> {code}
> public void set(int index, int days, int milliseconds) {
> final int offsetIndex = index * VALUE_WIDTH;
> data.setInt(offsetIndex, days);
> data.setInt((offsetIndex + 4), milliseconds);
> }
> {code}
> Note also that the Drill IntervalDay need not be two fields wide. Except on a
> leap second, a day has a fixed number of milliseconds. And, the only way to
> compensate for a leap second is to know a point in time, which the interval
> does not have. Even if measured across a leap second, an interval of a minute
> is always 60 seconds. It is only when doing:
> {code}
> end date/time = start date/time + interval
> {code}
> is the leap second even needed.
> Although the ISO format expresses intervals as a tuple of (year, month, day,
> hour, minute, second), the same value can be expressed as (months, ms) (with
> the proper conversions), so Drill's interval types need only be 4 and 8 bytes
> wide.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)