[ 
https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szmigin updated ARROW-8581:
--------------------------------
    Description: 
h1. Summary Proposal

The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
of type {{DateTimeOffset}}, but this makes it very easy for the user to 
introduce subtle bugs when they work with the {{DateTime}} type in their own 
code.  This class of bugs could be avoided if these builders were instead typed 
on {{DateTime}} rather than {{DateTimeOffset}}.
h1. Details

The danger is introduced by the implicit widening conversion provided by the 
_DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
 
[https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]

The important part is this text:
{quote}The offset of the resulting DateTimeOffset object depends on the value 
of the DateTime.Kind property of the dateTime parameter:
 * If the value of the DateTime.Kind property is DateTimeKind.Local or 
DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set 
equal to dateTime, and its Offset property *is set equal to the offset of the 
local system's current time zone*.{quote}
 (Emphasis mine)

 If the user is operating in an environment with a positive GMT offset, it is 
very easy to write the wrong date to the builder:
{code:c#}
var builder = new Date32Array.Builder();
builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified: 
triggers the bug
{code}
Assume that the user is in the UK (as I am), where the GMT offset on the above 
date is 1 hour ahead.  This means that the conversion to {{DateTimeOffset}} 
will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed 
to the {{Append()}} method.  Arrow then calls {{ToUnixTimeMilliseconds()}}, 
which [only considers the date 
portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
 of its object, not the time portion or offset.  This means that the number of 
days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought 
they were specifying.

If the user chooses to use NodaTime as a "better" date and time-handling 
library, they will still likely run into the bug if they do the obvious thing:
{code:c#}
var builder = new Date32Array.Builder();
var ld = new NodaTime.LocalDate(2020, 4, 24);
builder.Append(ld.ToDateTimeUnspecified()); // Kind == 
DateTimeKind.Unspecified: also triggers the bug
{code}
h1. Suggested Improvement
 * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
{{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
 * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
{{DateTime}}, not {{DateTimeOffset}} (also a breaking change).

  was:
h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
    using Apache.Arrow;
    using Apache.Arrow.Memory;
    using System;

    internal static class Program
    {
        public static void Main(string[] args)
        {
            var allocator = new NativeMemoryAllocator();
            var builder = new Date32Array.Builder();
            var date = new DateTime(2020, 4, 24);
            Console.WriteLine($"Appending date {date:yyyy-MM-dd}");
            builder.Append(date);
            var array = builder.Build(allocator);
            var dateAgain = array.GetDate(0);
            Console.WriteLine($"Read date {dateAgain:yyyy-MM-dd}");
        }
    }
}{code}
Change {{new Date32Array.Builder()}} to {{new Date64Array.Builder()}} in the 
above code as appropriate to demonstrate for the other type.
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 


> [C#] Date32/64Array write & read back introduces off-by-one error
> -----------------------------------------------------------------
>
>                 Key: ARROW-8581
>                 URL: https://issues.apache.org/jira/browse/ARROW-8581
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C#
>    Affects Versions: 0.17.0
>         Environment: Windows 10 x64
>            Reporter: Adam Szmigin
>            Priority: Major
>
> h1. Summary Proposal
> The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values 
> of type {{DateTimeOffset}}, but this makes it very easy for the user to 
> introduce subtle bugs when they work with the {{DateTime}} type in their own 
> code.  This class of bugs could be avoided if these builders were instead 
> typed on {{DateTime}} rather than {{DateTimeOffset}}.
> h1. Details
> The danger is introduced by the implicit widening conversion provided by the 
> _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator:
>  
> [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1]
> The important part is this text:
> {quote}The offset of the resulting DateTimeOffset object depends on the value 
> of the DateTime.Kind property of the dateTime parameter:
>  * If the value of the DateTime.Kind property is DateTimeKind.Local or 
> DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is 
> set equal to dateTime, and its Offset property *is set equal to the offset of 
> the local system's current time zone*.{quote}
>  (Emphasis mine)
>  If the user is operating in an environment with a positive GMT offset, it is 
> very easy to write the wrong date to the builder:
> {code:c#}
> var builder = new Date32Array.Builder();
> builder.Append(new DateTime(2020, 4, 24)); // Kind == 
> DateTimeKind.Unspecified: triggers the bug
> {code}
> Assume that the user is in the UK (as I am), where the GMT offset on the 
> above date is 1 hour ahead.  This means that the conversion to 
> {{DateTimeOffset}} will actually result in a value of 
> {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method.  Arrow 
> then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date 
> portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e]
>  of its object, not the time portion or offset.  This means that the number 
> of days gets calculated based on 2020-04-23, not 2020-04-24 as the user 
> thought they were specifying.
> If the user chooses to use NodaTime as a "better" date and time-handling 
> library, they will still likely run into the bug if they do the obvious thing:
> {code:c#}
> var builder = new Date32Array.Builder();
> var ld = new NodaTime.LocalDate(2020, 4, 24);
> builder.Append(ld.ToDateTimeUnspecified()); // Kind == 
> DateTimeKind.Unspecified: also triggers the bug
> {code}
> h1. Suggested Improvement
>  * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a 
> {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change).
>  * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a 
> {{DateTime}}, not {{DateTimeOffset}} (also a breaking change).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to