If it’s a display problem, should it block the release? Sent from my iPhone
> On Jul 19, 2020, at 3:57 PM, Wes McKinney <wesmck...@gmail.com> wrote: > > I opened https://issues.apache.org/jira/browse/ARROW-9525 about the > display problem. My guess is that there are other problems lurking > here > >> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >> hi Bryan, >> >> This is a display bug >> >> In [6]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns', >> 'America/Los_Angeles')) >> >> In [7]: arr.view('int64') >> Out[7]: >> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30> >> [ >> 0, >> 1, >> 2 >> ] >> >> In [8]: arr >> Out[8]: >> <pyarrow.lib.TimestampArray object at 0x7fd1b8aae6e0> >> [ >> 1970-01-01 00:00:00.000000000, >> 1970-01-01 00:00:00.000000001, >> 1970-01-01 00:00:00.000000002 >> ] >> >> In [9]: arr.to_pandas() >> Out[9]: >> 0 1969-12-31 16:00:00-08:00 >> 1 1969-12-31 16:00:00.000000001-08:00 >> 2 1969-12-31 16:00:00.000000002-08:00 >> dtype: datetime64[ns, America/Los_Angeles] >> >> the repr of TimestampArray doesn't take into account the timezone >> >> In [10]: arr[0] >> Out[10]: <pyarrow.TimestampScalar: Timestamp('1969-12-31 >> 16:00:00-0800', tz='America/Los_Angeles')> >> >> So if it's incorrect, the problem is happening somewhere before or >> while the StructArray is being created. If I had to guess it's caused >> by the tzinfo of the datetime.datetime values not being handled in the >> way that they were before >> >>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney <wesmck...@gmail.com> wrote: >>> >>> Well this is not good and pretty disappointing given that we had nearly a >>> month to sort through the implications of Micah’s patch. We should try to >>> resolve this ASAP >>> >>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler <cutl...@gmail.com> wrote: >>>> >>>> +0 (non-binding) >>>> >>>> I ran verification script for binaries and then source, as below, and both >>>> look good >>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1 >>>> TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 TEST_INTEGRATION_JAVA=1 >>>> dev/release/verify-release-candidate.sh source 1.0.0 1 >>>> >>>> I tried to patch Spark locally to verify the recent change in nested >>>> timestamps and was not able to get things working quite right, but I'm not >>>> sure if the problem is in Spark, Arrow or my patch - hence my vote of +0. >>>> >>>> Here is what I'm seeing >>>> >>>> ``` >>>> (Input as datetime) >>>> datetime.datetime(2018, 3, 10, 0, 0) >>>> datetime.datetime(2018, 3, 15, 0, 0) >>>> >>>> (Struct Array) >>>> -- is_valid: all not null >>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles] >>>> [ >>>> 2018-03-10 00:00:00.000000, >>>> 2018-03-10 00:00:00.000000 >>>> ] >>>> -- child 1 type: timestamp[us, tz=America/Los_Angeles] >>>> [ >>>> 2018-03-15 00:00:00.000000, >>>> 2018-03-15 00:00:00.000000 >>>> ] >>>> >>>> (Flattened Arrays) >>>> types [TimestampType(timestamp[us, tz=America/Los_Angeles]), >>>> TimestampType(timestamp[us, tz=America/Los_Angeles])] >>>> [<pyarrow.lib.TimestampArray object at 0x7ffbbd88f520> >>>> [ >>>> 2018-03-10 00:00:00.000000, >>>> 2018-03-10 00:00:00.000000 >>>> ], <pyarrow.lib.TimestampArray object at 0x7ffba958be50> >>>> [ >>>> 2018-03-15 00:00:00.000000, >>>> 2018-03-15 00:00:00.000000 >>>> ]] >>>> >>>> (Pandas Conversion) >>>> [ >>>> 0 2018-03-09 16:00:00-08:00 >>>> 1 2018-03-09 16:00:00-08:00 >>>> dtype: datetime64[ns, America/Los_Angeles], >>>> >>>> 0 2018-03-14 17:00:00-07:00 >>>> 1 2018-03-14 17:00:00-07:00 >>>> dtype: datetime64[ns, America/Los_Angeles]] >>>> ``` >>>> >>>> Based on output of existing a correct timestamp udf, it looks like the >>>> pyarrow Struct Array values are wrong and that's carried through the >>>> flattened arrays, causing the Pandas values to have a negative offset. >>>> >>>> Here is output from a working udf with timestamp, the pyarrow Array >>>> displays in UTC time, I believe. >>>> >>>> ``` >>>> (Timestamp Array) >>>> type timestamp[us, tz=America/Los_Angeles] >>>> [ >>>> [ >>>> 1969-01-01 09:01:01.000000 >>>> ] >>>> ] >>>> >>>> (Pandas Conversion) >>>> 0 1969-01-01 01:01:01-08:00 >>>> Name: _0, dtype: datetime64[ns, America/Los_Angeles] >>>> >>>> (Timezone Localized) >>>> 0 1969-01-01 01:01:01 >>>> Name: _0, dtype: datetime64[ns] >>>> ``` >>>> >>>> I'll have to dig in further at another time and debug where the values go >>>> wrong. >>>> >>>> On Sat, Jul 18, 2020 at 9:51 PM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>> >>>>> +1 (binding) >>>>> >>>>> Ran wheel and binary tests on ubuntu 19.04 >>>>> >>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal Richardson < >>>>> neal.p.richard...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 (binding) >>>>>> >>>>>> In addition to the usual verification on >>>>>> https://github.com/apache/arrow/pull/7787, I've successfully staged the >>>>> R >>>>>> binary artifacts on Windows ( >>>>>> https://github.com/r-windows/rtools-packages/pull/126), macOS ( >>>>>> https://github.com/autobrew/homebrew-core/pull/12), and Linux ( >>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/172977277) >>>>> using >>>>>> the release candidate. >>>>>> >>>>>> And I agree with the judgment about skipping a JS release artifact. Looks >>>>>> like there hasn't been a code change since October so there's no point. >>>>>> >>>>>> Neal >>>>>> >>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes McKinney <wesmck...@gmail.com> >>>>> wrote: >>>>>> >>>>>>> I see the JS failures as well. I think it is a failure localized to >>>>>>> newer Node versions since our JavaScript CI works fine. I don't think >>>>>>> it should block the release given the lack of development activity in >>>>>>> JavaScript [1] -- if any JS devs are concerned about publishing an >>>>>>> artifact then we can skip pushing it to NPM >>>>>>> >>>>>>> @Ryan it seems it may be something environment related on your >>>>>>> machine, I'm on Ubuntu 18.04 and have not seen this. >>>>>>> >>>>>>> On >>>>>>> >>>>>>>> * Python 3.8 wheel's tests are failed. 3.5, 3.6 and 3.7 >>>>>>>> are passed. It seems that -larrow and -larrow_python for >>>>>>>> Cython are failed. >>>>>>> >>>>>>> I suspect this is related to >>>>>>> >>>>>>> >>>>>> >>>>> https://github.com/apache/arrow/commit/120c21f4bf66d2901b3a353a1f67bac3c3355924#diff-0f69784b44040448d17d0e4e8a641fe8 >>>>>>> , >>>>>>> but I don't think it's a blocking issue >>>>>>> >>>>>>> [1]: https://github.com/apache/arrow/commits/master/js >>>>>>> >>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray <rym...@dremio.com> wrote: >>>>>>>> >>>>>>>> I've tested Java and it looks good. However the verify script keeps >>>>> on >>>>>>>> bailing with protobuf related errors: >>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc' >>>>> and >>>>>>>> friends cant find protobuf definitions. A bit odd as cmake can see >>>>>>> protobuf >>>>>>>> headers and builds directly off master work just fine. Has anyone >>>>> else >>>>>>>> experienced this? I am on ubutnu 18.04 >>>>>>>> >>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine Pitrou <anto...@python.org> >>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> +1 (binding). I tested on Ubuntu 18.04. >>>>>>>>> >>>>>>>>> * Wheels verification went fine. >>>>>>>>> * Source verification went fine with CUDA enabled and >>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0. >>>>>>>>> >>>>>>>>> I didn't test the binaries. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> Antoine. >>>>>>>>> >>>>>>>>> >>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a écrit : >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to propose the second release candidate (RC1) of >>>>>> Apache >>>>>>>>>> Arrow version 1.0.0. >>>>>>>>>> This is a major release consisting of 826 resolved JIRA >>>>> issues[1]. >>>>>>>>>> >>>>>>>>>> The verification of the first release candidate (RC0) has failed >>>>>>> [0], and >>>>>>>>>> the packaging scripts were unable to produce two wheels. Compared >>>>>>>>>> to RC0 this release candidate includes additional patches for the >>>>>>>>>> following bugs: ARROW-9506, ARROW-9504, ARROW-9497, >>>>>>>>>> ARROW-9500, ARROW-9499. >>>>>>>>>> >>>>>>>>>> This release candidate is based on commit: >>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 [2] >>>>>>>>>> >>>>>>>>>> The source release rc1 is hosted at [3]. >>>>>>>>>> The binary artifacts are hosted at [4][5][6][7]. >>>>>>>>>> The changelog is located at [8]. >>>>>>>>>> >>>>>>>>>> Please download, verify checksums and signatures, run the unit >>>>>> tests, >>>>>>>>>> and vote on the release. See [9] for how to validate a release >>>>>>> candidate. >>>>>>>>>> >>>>>>>>>> The vote will be open for at least 72 hours. >>>>>>>>>> >>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0 >>>>>>>>>> [ ] +0 >>>>>>>>>> [ ] -1 Do not release this as Apache Arrow 1.0.0 because... >>>>>>>>>> >>>>>>>>>> [0]: >>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment-659065370 >>>>>>>>>> [1]: >>>>>>>>> >>>>>>> >>>>>> >>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%201.0.0 >>>>>>>>>> [2]: >>>>>>>>> >>>>>>> >>>>>> >>>>> https://github.com/apache/arrow/tree/bc0649541859095ee77d03a7b891ea8d6e2fd641 >>>>>>>>>> [3]: >>>>>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-1.0.0-rc1 >>>>>>>>>> [4]: https://bintray.com/apache/arrow/centos-rc/1.0.0-rc1 >>>>>>>>>> [5]: https://bintray.com/apache/arrow/debian-rc/1.0.0-rc1 >>>>>>>>>> [6]: https://bintray.com/apache/arrow/python-rc/1.0.0-rc1 >>>>>>>>>> [7]: https://bintray.com/apache/arrow/ubuntu-rc/1.0.0-rc1 >>>>>>>>>> [8]: >>>>>>>>> >>>>>>> >>>>>> >>>>> https://github.com/apache/arrow/blob/bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md >>>>>>>>>> [9]: >>>>>>>>> >>>>>>> >>>>>> >>>>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>