mr-smidge commented on a change in pull request #7654: URL: https://github.com/apache/arrow/pull/7654#discussion_r457739938
########## File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs ########## @@ -15,56 +15,94 @@ using Apache.Arrow.Types; using System; -using System.Collections.Generic; -using Apache.Arrow.Memory; namespace Apache.Arrow { + /// <summary> + /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is + /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds. + /// </summary> public class Date64Array: PrimitiveArray<long> { + private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified); + public Date64Array( ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer, int length, int nullCount, int offset) : this(new ArrayData(Date64Type.Default, length, nullCount, offset, new[] { nullBitmapBuffer, valueBuffer })) { } - public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder> + /// <summary> + /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects. + /// </summary> + public class Builder : DateArrayBuilder<long, Date64Array, Builder> { - internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder> + private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder> { protected override Date64Array Build( ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer, int length, int nullCount, int offset) => new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset); - } + } + /// <summary> + /// Construct a new instance of the <see cref="Builder"/> class. + /// </summary> public Builder() : base(new DateBuilder()) { } - protected override long ConvertTo(DateTimeOffset value) + protected override long Convert(DateTime dateTime) + { + return (long)(dateTime - _epoch).TotalMilliseconds; + } + + protected override long Convert(DateTimeOffset dateTimeOffset) { - return value.ToUnixTimeMilliseconds(); + return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds; Review comment: > The Unix Epoch is defined as 00:00:00 UTC Thursday, 1 January 1970. Apologies, I've been too colloquial with use of the word epoch when talking about dates. You are right that the UNIX epoch is a point in time, not a date. However, `Date32` and `Date64` are measures of _dates_. As the standard C# library does not have a dedicated type for dates, we have to infer a date from either `DateTime` or `DateTimeOffset`. Dates and instants in time aren't implicitly convertible in the natural sense because of timezones: there has to be some logic in the conversion. > The logic needs to take into account the Offset portion of the DateTimeOffset. > I can have the same moment in time represented with 2 different DateTimeOffsets Absolutely right - the same point in time can lie within two different calendar dates depending on one's timezone. At the time I'm writing this it's just gone midnight in the UK, so I think it's 21st July right now. But if you're on CDT, it's 20th July for you at the very same instant. For clarity: I think the most natural way to infer a meaningful date from `DateTime`/`DateTimeOffset` is to call the `.Date` property, i.e. I would expect the time and offset to be _ignored_ when considering the date. > I can have the same moment in time represented with 2 different DateTimeOffsets - one with a negative Offset, and one with a positive Offset. If used the proposed Date32Array.Builder, I would get 2 different days. This is not correct. I disagree with the statement that this is not correct, but I want to understand your viewpoint better... From looking at your test case, do you mean to suggest that you believe a conversion to UTC first and then taking the date from the UTC date/time is the most natural way to infer a date from a `DateTimeOffset`? This is what happens by doing `(int)(dto.ToUnixTimeMilliseconds() / MillisecondsPerDay)`, but it's an interpretation of dates from times that I've never seen in any other project! If you take the "simplest approach" and call `.Date` (or its equivalent in NodaTime), time and offset are always ignored: ```csharp // Using System types var westernHemisphereTime = new DateTimeOffset(2020, 7, 20, 20, 0, 0, TimeSpan.FromHours(-5)); var easternHemisphereTime = new DateTimeOffset(2020, 7, 21, 10, 0, 0, TimeSpan.FromHours(9)); Assert.NotEqual(westernHemisphereTime.Date == easternHemisphereTime.Date); Assert.Equal(westernHemisphereTime.UtcTicks == easternHemisphereTime.UtcTicks); // Using NodaTime types var westernHemisphereZtd = new LocalDateTime(2020, 7, 20, 20, 0, 0).InZoneStrictly(DateTimeZoneProviders.Tzdb["America/Chicago"]); var easternHemisphereZtd = new LocalDateTime(2020, 7, 21, 10, 0, 0).InZoneStrictly(DateTimeZoneProviders.Tzdb["Asia/Tokyo"]); Assert.NotEqual(westernHemisphereZtd.Date == easternHemisphereZtd.Date); Assert.Equal(westernHemisphereZtd.ToInstant() == easternHemisphereZtd.ToInstant()); ``` Separately, I've realised that `Date64Array` had an existing bug that I would not be fixing if I were to retain the previous code: the previous code would not truncate non-date parts and hence could store values that were not a multiple of 8640000, which is a violation of the spec. I think the best thing to do is to put this PR back into draft mode, implement a _full set_ of unit tests for both `Date32` and `Date64` (and their respective builders), and hence ensure we have full clarity on expected behaviour. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org