[
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Szmigin updated ARROW-8581:
--------------------------------
Description:
h1. Summary Proposal
The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values
of type {{DateTimeOffset}}, but this makes it very easy for the user to
introduce subtle bugs when they work with the {{DateTime}} type in their own
code. This class of bugs could be avoided if these builders were instead typed
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details
The danger is introduced by the implicit widening conversion provided by the
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]
The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value
of the DateTime.Kind property of the dateTime parameter:
* If the value of the DateTime.Kind property is DateTimeKind.Local or
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set
equal to dateTime, and its Offset property *is set equal to the offset of the
local system's current time zone*.{quote}
(Emphasis mine)
If the user is operating in an environment with a positive GMT offset, it is
very easy to write the wrong date to the builder:
{code:c#}
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified:
triggers the bug
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above
date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}}
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed
to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}},
which [only considers the date
portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
of its object, not the time portion or offset. This means that the number of
days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought
they were specifying.
If the user chooses to use NodaTime as a "better" date and time-handling
library, they will still likely run into the bug if they do the obvious thing:
{code:c#}
var builder = new Date32Array.Builder();
var ld = new NodaTime.LocalDate(2020, 4, 24);
builder.Append(ld.ToDateTimeUnspecified()); // Kind ==
DateTimeKind.Unspecified: also triggers the bug
{code}
h1. Suggested Improvement
* Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a
{{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
* Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a
{{DateTime}}, not {{DateTimeOffset}} (also a breaking change).
was:
h1. Summary
Writing a Date value using either a {{Date32Array.Builder}} or
{{Date64.Builder}} and then reading back the result from the built array
introduces an off-by-one error in the value. The following minimal code
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;
internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:yyyy-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:yyyy-MM-dd}");
}
}
}{code}
Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the
above code as appropriate to demonstrate for the other type.
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
> [C#] Date32/64Array write & read back introduces off-by-one error
> -----------------------------------------------------------------
>
> Key: ARROW-8581
> URL: https://issues.apache.org/jira/browse/ARROW-8581
> Project: Apache Arrow
> Issue Type: Bug
> Components: C#
> Affects Versions: 0.17.0
> Environment: Windows 10 x64
> Reporter: Adam Szmigin
> Priority: Major
>
> h1. Summary Proposal
> The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values
> of type {{DateTimeOffset}}, but this makes it very easy for the user to
> introduce subtle bugs when they work with the {{DateTime}} type in their own
> code. This class of bugs could be avoided if these builders were instead
> typed on {{DateTime}} rather than {{DateTimeOffset}}.
> h1. Details
> The danger is introduced by the implicit widening conversion provided by the
> _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
>
> [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]
> The important part is this text:
> {quote}The offset of the resulting DateTimeOffset object depends on the value
> of the DateTime.Kind property of the dateTime parameter:
> * If the value of the DateTime.Kind property is DateTimeKind.Local or
> DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is
> set equal to dateTime, and its Offset property *is set equal to the offset of
> the local system's current time zone*.{quote}
> (Emphasis mine)
> If the user is operating in an environment with a positive GMT offset, it is
> very easy to write the wrong date to the builder:
> {code:c#}
> var builder = new Date32Array.Builder();
> builder.Append(new DateTime(2020, 4, 24)); // Kind ==
> DateTimeKind.Unspecified: triggers the bug
> {code}
> Assume that the user is in the UK (as I am), where the GMT offset on the
> above date is 1 hour ahead. This means that the conversion to
> {{DateTimeOffset}} will actually result in a value of
> {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow
> then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date
> portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
> of its object, not the time portion or offset. This means that the number
> of days gets calculated based on 2020-04-23, not 2020-04-24 as the user
> thought they were specifying.
> If the user chooses to use NodaTime as a "better" date and time-handling
> library, they will still likely run into the bug if they do the obvious thing:
> {code:c#}
> var builder = new Date32Array.Builder();
> var ld = new NodaTime.LocalDate(2020, 4, 24);
> builder.Append(ld.ToDateTimeUnspecified()); // Kind ==
> DateTimeKind.Unspecified: also triggers the bug
> {code}
> h1. Suggested Improvement
> * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a
> {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
> * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a
> {{DateTime}}, not {{DateTimeOffset}} (also a breaking change).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)