mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r457739938



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the 
<c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, 
excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 
0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, 
Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see 
cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, 
Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, 
Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, 
nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       > The Unix Epoch is defined as 00:00:00 UTC Thursday, 1 January 1970.
   
   Apologies, I've been too colloquial with use of the word epoch when talking 
about dates.  You are right that the UNIX epoch is a point in time, not a date.
   
   However, `Date32` and `Date64` are measures of _dates_.  As the standard C# 
library does not have a dedicated type for dates, we have to infer a date from 
either `DateTime` or `DateTimeOffset`.  Dates and instants in time aren't 
implicitly convertible in the natural sense because of timezones: there has to 
be some logic in the conversion.
   
   >  The logic needs to take into account the Offset portion of the 
DateTimeOffset.
   > I can have the same moment in time represented with 2 different 
DateTimeOffsets
   
   Absolutely right - the same point in time can lie within two different 
calendar dates depending on one's timezone.  At the time I'm writing this it's 
just gone midnight in the UK, so I think it's 21st July right now.  But if 
you're on CDT, it's 20th July for you at the very same instant.
   
   For clarity: I think the most natural way to infer a meaningful date from 
`DateTime`/`DateTimeOffset` is to call the `.Date` property, i.e. I would 
expect the time and offset to be _ignored_ when considering the date.
   
   > I can have the same moment in time represented with 2 different 
DateTimeOffsets - one with a negative Offset, and one with a positive Offset. 
If used the proposed Date32Array.Builder, I would get 2 different days. This is 
not correct.
   
   I disagree with the statement that this is not correct, but I want to 
understand your viewpoint better...
   
   From looking at your test case, do you mean to suggest that you believe a 
conversion to UTC first and then taking the date from the UTC date/time is the 
most natural way to infer a date from a `DateTimeOffset`?  This is what happens 
by doing `(int)(dto.ToUnixTimeMilliseconds() / MillisecondsPerDay)`, but it's 
an interpretation of dates from times that I've never seen in any other project!
   
   If you take the "simplest approach" and call `.Date` (or its equivalent in 
NodaTime), time and offset are always ignored:
   
   ```csharp
   // Using System types
   var westernHemisphereTime = new DateTimeOffset(2020, 7, 20, 20, 0, 0, 
TimeSpan.FromHours(-5));
   var easternHemisphereTime = new DateTimeOffset(2020, 7, 21, 10, 0, 0, 
TimeSpan.FromHours(9));
   Assert.NotEqual(westernHemisphereTime.Date == easternHemisphereTime.Date);
   Assert.Equal(westernHemisphereTime.UtcTicks == 
easternHemisphereTime.UtcTicks);
   
   // Using NodaTime types
   var westernHemisphereZtd = new LocalDateTime(2020, 7, 20, 20, 0, 
0).InZoneStrictly(DateTimeZoneProviders.Tzdb["America/Chicago"]);
   var easternHemisphereZtd = new LocalDateTime(2020, 7, 21, 10, 0, 
0).InZoneStrictly(DateTimeZoneProviders.Tzdb["Asia/Tokyo"]);
   Assert.NotEqual(westernHemisphereZtd.Date == easternHemisphereZtd.Date);
   Assert.Equal(westernHemisphereZtd.ToInstant() == 
easternHemisphereZtd.ToInstant());
   ```
   
   Separately, I've realised that `Date64Array` had an existing bug that I 
would not be fixing if I were to retain the previous code: the previous code 
would not truncate non-date parts and hence could store values that were not a 
multiple of 8640000, which is a violation of the spec.
   
   I think the best thing to do is to put this PR back into draft mode, 
implement a _full set_ of unit tests for both `Date32` and `Date64` (and their 
respective builders), and hence ensure we have full clarity on expected 
behaviour.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to