If it’s a display problem, should it block the release?

Sent from my iPhone

> On Jul 19, 2020, at 3:57 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> I opened https://issues.apache.org/jira/browse/ARROW-9525 about the
> display problem. My guess is that there are other problems lurking
> here
> 
>> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney <wesmck...@gmail.com> wrote:
>> 
>> hi Bryan,
>> 
>> This is a display bug
>> 
>> In [6]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns',
>> 'America/Los_Angeles'))
>> 
>> In [7]: arr.view('int64')
>> Out[7]:
>> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30>
>> [
>>  0,
>>  1,
>>  2
>> ]
>> 
>> In [8]: arr
>> Out[8]:
>> <pyarrow.lib.TimestampArray object at 0x7fd1b8aae6e0>
>> [
>>  1970-01-01 00:00:00.000000000,
>>  1970-01-01 00:00:00.000000001,
>>  1970-01-01 00:00:00.000000002
>> ]
>> 
>> In [9]: arr.to_pandas()
>> Out[9]:
>> 0             1969-12-31 16:00:00-08:00
>> 1   1969-12-31 16:00:00.000000001-08:00
>> 2   1969-12-31 16:00:00.000000002-08:00
>> dtype: datetime64[ns, America/Los_Angeles]
>> 
>> the repr of TimestampArray doesn't take into account the timezone
>> 
>> In [10]: arr[0]
>> Out[10]: <pyarrow.TimestampScalar: Timestamp('1969-12-31
>> 16:00:00-0800', tz='America/Los_Angeles')>
>> 
>> So if it's incorrect, the problem is happening somewhere before or
>> while the StructArray is being created. If I had to guess it's caused
>> by the tzinfo of the datetime.datetime values not being handled in the
>> way that they were before
>> 
>>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>> 
>>> Well this is not good and pretty disappointing given that we had nearly a 
>>> month to sort through the implications of Micah’s patch. We should try to 
>>> resolve this ASAP
>>> 
>>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler <cutl...@gmail.com> wrote:
>>>> 
>>>> +0 (non-binding)
>>>> 
>>>> I ran verification script for binaries and then source, as below, and both
>>>> look good
>>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1
>>>> TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1
>>>> dev/release/verify-release-candidate.sh source 1.0.0 1
>>>> 
>>>> I tried to patch Spark locally to verify the recent change in nested
>>>> timestamps and was not able to get things working quite right, but I'm not
>>>> sure if the problem is in Spark, Arrow or my patch - hence my vote of +0.
>>>> 
>>>> Here is what I'm seeing
>>>> 
>>>> ```
>>>> (Input as datetime)
>>>> datetime.datetime(2018, 3, 10, 0, 0)
>>>> datetime.datetime(2018, 3, 15, 0, 0)
>>>> 
>>>> (Struct Array)
>>>> -- is_valid: all not null
>>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles]
>>>>  [
>>>>    2018-03-10 00:00:00.000000,
>>>>    2018-03-10 00:00:00.000000
>>>>  ]
>>>> -- child 1 type: timestamp[us, tz=America/Los_Angeles]
>>>>  [
>>>>    2018-03-15 00:00:00.000000,
>>>>    2018-03-15 00:00:00.000000
>>>>  ]
>>>> 
>>>> (Flattened Arrays)
>>>> types [TimestampType(timestamp[us, tz=America/Los_Angeles]),
>>>> TimestampType(timestamp[us, tz=America/Los_Angeles])]
>>>> [<pyarrow.lib.TimestampArray object at 0x7ffbbd88f520>
>>>> [
>>>>  2018-03-10 00:00:00.000000,
>>>>  2018-03-10 00:00:00.000000
>>>> ], <pyarrow.lib.TimestampArray object at 0x7ffba958be50>
>>>> [
>>>>  2018-03-15 00:00:00.000000,
>>>>  2018-03-15 00:00:00.000000
>>>> ]]
>>>> 
>>>> (Pandas Conversion)
>>>> [
>>>> 0   2018-03-09 16:00:00-08:00
>>>> 1   2018-03-09 16:00:00-08:00
>>>> dtype: datetime64[ns, America/Los_Angeles],
>>>> 
>>>> 0   2018-03-14 17:00:00-07:00
>>>> 1   2018-03-14 17:00:00-07:00
>>>> dtype: datetime64[ns, America/Los_Angeles]]
>>>> ```
>>>> 
>>>> Based on output of existing a correct timestamp udf, it looks like the
>>>> pyarrow Struct Array values are wrong and that's carried through the
>>>> flattened arrays, causing the Pandas values to have a negative offset.
>>>> 
>>>> Here is output from a working udf with timestamp, the pyarrow Array
>>>> displays in UTC time, I believe.
>>>> 
>>>> ```
>>>> (Timestamp Array)
>>>> type timestamp[us, tz=America/Los_Angeles]
>>>> [
>>>>  [
>>>>    1969-01-01 09:01:01.000000
>>>>  ]
>>>> ]
>>>> 
>>>> (Pandas Conversion)
>>>> 0   1969-01-01 01:01:01-08:00
>>>> Name: _0, dtype: datetime64[ns, America/Los_Angeles]
>>>> 
>>>> (Timezone Localized)
>>>> 0   1969-01-01 01:01:01
>>>> Name: _0, dtype: datetime64[ns]
>>>> ```
>>>> 
>>>> I'll have to dig in further at another time and debug where the values go
>>>> wrong.
>>>> 
>>>> On Sat, Jul 18, 2020 at 9:51 PM Micah Kornfield <emkornfi...@gmail.com>
>>>> wrote:
>>>> 
>>>>> +1 (binding)
>>>>> 
>>>>> Ran wheel and binary tests on ubuntu 19.04
>>>>> 
>>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal Richardson <
>>>>> neal.p.richard...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> In addition to the usual verification on
>>>>>> https://github.com/apache/arrow/pull/7787, I've successfully staged the
>>>>> R
>>>>>> binary artifacts on Windows (
>>>>>> https://github.com/r-windows/rtools-packages/pull/126), macOS (
>>>>>> https://github.com/autobrew/homebrew-core/pull/12), and Linux (
>>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/172977277)
>>>>> using
>>>>>> the release candidate.
>>>>>> 
>>>>>> And I agree with the judgment about skipping a JS release artifact. Looks
>>>>>> like there hasn't been a code change since October so there's no point.
>>>>>> 
>>>>>> Neal
>>>>>> 
>>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes McKinney <wesmck...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> I see the JS failures as well. I think it is a failure localized to
>>>>>>> newer Node versions since our JavaScript CI works fine. I don't think
>>>>>>> it should block the release given the lack of development activity in
>>>>>>> JavaScript [1] -- if any JS devs are concerned about publishing an
>>>>>>> artifact then we can skip pushing it to NPM
>>>>>>> 
>>>>>>> @Ryan it seems it may be something environment related on your
>>>>>>> machine, I'm on Ubuntu 18.04 and have not seen this.
>>>>>>> 
>>>>>>> On
>>>>>>> 
>>>>>>>>  * Python 3.8 wheel's tests are failed. 3.5, 3.6 and 3.7
>>>>>>>>    are passed. It seems that -larrow and -larrow_python for
>>>>>>>>    Cython are failed.
>>>>>>> 
>>>>>>> I suspect this is related to
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> https://github.com/apache/arrow/commit/120c21f4bf66d2901b3a353a1f67bac3c3355924#diff-0f69784b44040448d17d0e4e8a641fe8
>>>>>>> ,
>>>>>>> but I don't think it's a blocking issue
>>>>>>> 
>>>>>>> [1]: https://github.com/apache/arrow/commits/master/js
>>>>>>> 
>>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray <rym...@dremio.com> wrote:
>>>>>>>> 
>>>>>>>> I've tested Java and it looks good. However the verify script keeps
>>>>> on
>>>>>>>> bailing with protobuf related errors:
>>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc'
>>>>> and
>>>>>>>> friends cant find protobuf definitions. A bit odd as cmake can see
>>>>>>> protobuf
>>>>>>>> headers and builds directly off master work just fine. Has anyone
>>>>> else
>>>>>>>> experienced this? I am on ubutnu 18.04
>>>>>>>> 
>>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine Pitrou <anto...@python.org>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> +1 (binding).  I tested on Ubuntu 18.04.
>>>>>>>>> 
>>>>>>>>> * Wheels verification went fine.
>>>>>>>>> * Source verification went fine with CUDA enabled and
>>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0.
>>>>>>>>> 
>>>>>>>>> I didn't test the binaries.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> 
>>>>>>>>> Antoine.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a écrit :
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I would like to propose the second release candidate (RC1) of
>>>>>> Apache
>>>>>>>>>> Arrow version 1.0.0.
>>>>>>>>>> This is a major release consisting of 826 resolved JIRA
>>>>> issues[1].
>>>>>>>>>> 
>>>>>>>>>> The verification of the first release candidate (RC0) has failed
>>>>>>> [0], and
>>>>>>>>>> the packaging scripts were unable to produce two wheels. Compared
>>>>>>>>>> to RC0 this release candidate includes additional patches for the
>>>>>>>>>> following bugs: ARROW-9506, ARROW-9504, ARROW-9497,
>>>>>>>>>> ARROW-9500, ARROW-9499.
>>>>>>>>>> 
>>>>>>>>>> This release candidate is based on commit:
>>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 [2]
>>>>>>>>>> 
>>>>>>>>>> The source release rc1 is hosted at [3].
>>>>>>>>>> The binary artifacts are hosted at [4][5][6][7].
>>>>>>>>>> The changelog is located at [8].
>>>>>>>>>> 
>>>>>>>>>> Please download, verify checksums and signatures, run the unit
>>>>>> tests,
>>>>>>>>>> and vote on the release. See [9] for how to validate a release
>>>>>>> candidate.
>>>>>>>>>> 
>>>>>>>>>> The vote will be open for at least 72 hours.
>>>>>>>>>> 
>>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0
>>>>>>>>>> [ ] +0
>>>>>>>>>> [ ] -1 Do not release this as Apache Arrow 1.0.0 because...
>>>>>>>>>> 
>>>>>>>>>> [0]:
>>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment-659065370
>>>>>>>>>> [1]:
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%201.0.0
>>>>>>>>>> [2]:
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> https://github.com/apache/arrow/tree/bc0649541859095ee77d03a7b891ea8d6e2fd641
>>>>>>>>>> [3]:
>>>>>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-1.0.0-rc1
>>>>>>>>>> [4]: https://bintray.com/apache/arrow/centos-rc/1.0.0-rc1
>>>>>>>>>> [5]: https://bintray.com/apache/arrow/debian-rc/1.0.0-rc1
>>>>>>>>>> [6]: https://bintray.com/apache/arrow/python-rc/1.0.0-rc1
>>>>>>>>>> [7]: https://bintray.com/apache/arrow/ubuntu-rc/1.0.0-rc1
>>>>>>>>>> [8]:
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> https://github.com/apache/arrow/blob/bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md
>>>>>>>>>> [9]:
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 

Reply via email to