There seems to be other broken StructArray stuff

In [14]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns',
'America/Los_Angeles'))

In [15]: struct_arr = pa.StructArray.from_arrays([arr], names=['f0'])

In [16]: struct_arr
Out[16]:
<pyarrow.lib.StructArray object at 0x7f089370f590>
-- is_valid: all not null
-- child 0 type: timestamp[ns, tz=America/Los_Angeles]
  [
    1970-01-01 00:00:00.000000000,
    1970-01-01 00:00:00.000000001,
    1970-01-01 00:00:00.000000002
  ]

In [17]: struct_arr.to_pandas()
Out[17]:
0    {'f0': 0}
1    {'f0': 1}
2    {'f0': 2}
dtype: object

All in all it appears that this part of the project needs some TLC

On Sun, Jul 19, 2020 at 6:16 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> Well, the problem is that time zones are really finicky comparing
> Spark (which uses a localtime interpretation of timestamps without
> time zone) and Arrow (which has naive timestamps -- a concept similar
> but different from the SQL concept TIMESTAMP WITHOUT TIME ZONE -- and
> tz-aware timestamps). So somewhere there is a time zone being stripped
> or applied/localized which may result in the transferred data to/from
> Spark being shifted by the time zone offset. I think it's important
> that we determine what the problem is -- if it's a problem that has to
> be fixed in Arrow (and it's not clear to me that it is) it's worth
> spending some time to understand what's going on to avoid the
> possibility of patch release on account of this.
>
> On Sun, Jul 19, 2020 at 6:12 PM Neal Richardson
> <neal.p.richard...@gmail.com> wrote:
> >
> > If it’s a display problem, should it block the release?
> >
> > Sent from my iPhone
> >
> > > On Jul 19, 2020, at 3:57 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > I opened https://issues.apache.org/jira/browse/ARROW-9525 about the
> > > display problem. My guess is that there are other problems lurking
> > > here
> > >
> > >> On Sun, Jul 19, 2020 at 5:54 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > >>
> > >> hi Bryan,
> > >>
> > >> This is a display bug
> > >>
> > >> In [6]: arr = pa.array([0, 1, 2], type=pa.timestamp('ns',
> > >> 'America/Los_Angeles'))
> > >>
> > >> In [7]: arr.view('int64')
> > >> Out[7]:
> > >> <pyarrow.lib.Int64Array object at 0x7fd1b8aaef30>
> > >> [
> > >>  0,
> > >>  1,
> > >>  2
> > >> ]
> > >>
> > >> In [8]: arr
> > >> Out[8]:
> > >> <pyarrow.lib.TimestampArray object at 0x7fd1b8aae6e0>
> > >> [
> > >>  1970-01-01 00:00:00.000000000,
> > >>  1970-01-01 00:00:00.000000001,
> > >>  1970-01-01 00:00:00.000000002
> > >> ]
> > >>
> > >> In [9]: arr.to_pandas()
> > >> Out[9]:
> > >> 0             1969-12-31 16:00:00-08:00
> > >> 1   1969-12-31 16:00:00.000000001-08:00
> > >> 2   1969-12-31 16:00:00.000000002-08:00
> > >> dtype: datetime64[ns, America/Los_Angeles]
> > >>
> > >> the repr of TimestampArray doesn't take into account the timezone
> > >>
> > >> In [10]: arr[0]
> > >> Out[10]: <pyarrow.TimestampScalar: Timestamp('1969-12-31
> > >> 16:00:00-0800', tz='America/Los_Angeles')>
> > >>
> > >> So if it's incorrect, the problem is happening somewhere before or
> > >> while the StructArray is being created. If I had to guess it's caused
> > >> by the tzinfo of the datetime.datetime values not being handled in the
> > >> way that they were before
> > >>
> > >>> On Sun, Jul 19, 2020 at 5:19 PM Wes McKinney <wesmck...@gmail.com> 
> > >>> wrote:
> > >>>
> > >>> Well this is not good and pretty disappointing given that we had nearly 
> > >>> a month to sort through the implications of Micah’s patch. We should 
> > >>> try to resolve this ASAP
> > >>>
> > >>> On Sun, Jul 19, 2020 at 5:10 PM Bryan Cutler <cutl...@gmail.com> wrote:
> > >>>>
> > >>>> +0 (non-binding)
> > >>>>
> > >>>> I ran verification script for binaries and then source, as below, and 
> > >>>> both
> > >>>> look good
> > >>>> ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1
> > >>>> TEST_PYTHON=1 TEST_JAVA=1 TEST_INTEGRATION_CPP=1 
> > >>>> TEST_INTEGRATION_JAVA=1
> > >>>> dev/release/verify-release-candidate.sh source 1.0.0 1
> > >>>>
> > >>>> I tried to patch Spark locally to verify the recent change in nested
> > >>>> timestamps and was not able to get things working quite right, but I'm 
> > >>>> not
> > >>>> sure if the problem is in Spark, Arrow or my patch - hence my vote of 
> > >>>> +0.
> > >>>>
> > >>>> Here is what I'm seeing
> > >>>>
> > >>>> ```
> > >>>> (Input as datetime)
> > >>>> datetime.datetime(2018, 3, 10, 0, 0)
> > >>>> datetime.datetime(2018, 3, 15, 0, 0)
> > >>>>
> > >>>> (Struct Array)
> > >>>> -- is_valid: all not null
> > >>>> -- child 0 type: timestamp[us, tz=America/Los_Angeles]
> > >>>>  [
> > >>>>    2018-03-10 00:00:00.000000,
> > >>>>    2018-03-10 00:00:00.000000
> > >>>>  ]
> > >>>> -- child 1 type: timestamp[us, tz=America/Los_Angeles]
> > >>>>  [
> > >>>>    2018-03-15 00:00:00.000000,
> > >>>>    2018-03-15 00:00:00.000000
> > >>>>  ]
> > >>>>
> > >>>> (Flattened Arrays)
> > >>>> types [TimestampType(timestamp[us, tz=America/Los_Angeles]),
> > >>>> TimestampType(timestamp[us, tz=America/Los_Angeles])]
> > >>>> [<pyarrow.lib.TimestampArray object at 0x7ffbbd88f520>
> > >>>> [
> > >>>>  2018-03-10 00:00:00.000000,
> > >>>>  2018-03-10 00:00:00.000000
> > >>>> ], <pyarrow.lib.TimestampArray object at 0x7ffba958be50>
> > >>>> [
> > >>>>  2018-03-15 00:00:00.000000,
> > >>>>  2018-03-15 00:00:00.000000
> > >>>> ]]
> > >>>>
> > >>>> (Pandas Conversion)
> > >>>> [
> > >>>> 0   2018-03-09 16:00:00-08:00
> > >>>> 1   2018-03-09 16:00:00-08:00
> > >>>> dtype: datetime64[ns, America/Los_Angeles],
> > >>>>
> > >>>> 0   2018-03-14 17:00:00-07:00
> > >>>> 1   2018-03-14 17:00:00-07:00
> > >>>> dtype: datetime64[ns, America/Los_Angeles]]
> > >>>> ```
> > >>>>
> > >>>> Based on output of existing a correct timestamp udf, it looks like the
> > >>>> pyarrow Struct Array values are wrong and that's carried through the
> > >>>> flattened arrays, causing the Pandas values to have a negative offset.
> > >>>>
> > >>>> Here is output from a working udf with timestamp, the pyarrow Array
> > >>>> displays in UTC time, I believe.
> > >>>>
> > >>>> ```
> > >>>> (Timestamp Array)
> > >>>> type timestamp[us, tz=America/Los_Angeles]
> > >>>> [
> > >>>>  [
> > >>>>    1969-01-01 09:01:01.000000
> > >>>>  ]
> > >>>> ]
> > >>>>
> > >>>> (Pandas Conversion)
> > >>>> 0   1969-01-01 01:01:01-08:00
> > >>>> Name: _0, dtype: datetime64[ns, America/Los_Angeles]
> > >>>>
> > >>>> (Timezone Localized)
> > >>>> 0   1969-01-01 01:01:01
> > >>>> Name: _0, dtype: datetime64[ns]
> > >>>> ```
> > >>>>
> > >>>> I'll have to dig in further at another time and debug where the values 
> > >>>> go
> > >>>> wrong.
> > >>>>
> > >>>> On Sat, Jul 18, 2020 at 9:51 PM Micah Kornfield <emkornfi...@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> +1 (binding)
> > >>>>>
> > >>>>> Ran wheel and binary tests on ubuntu 19.04
> > >>>>>
> > >>>>> On Fri, Jul 17, 2020 at 2:25 PM Neal Richardson <
> > >>>>> neal.p.richard...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> +1 (binding)
> > >>>>>>
> > >>>>>> In addition to the usual verification on
> > >>>>>> https://github.com/apache/arrow/pull/7787, I've successfully staged 
> > >>>>>> the
> > >>>>> R
> > >>>>>> binary artifacts on Windows (
> > >>>>>> https://github.com/r-windows/rtools-packages/pull/126), macOS (
> > >>>>>> https://github.com/autobrew/homebrew-core/pull/12), and Linux (
> > >>>>>> https://github.com/ursa-labs/arrow-r-nightly/actions/runs/172977277)
> > >>>>> using
> > >>>>>> the release candidate.
> > >>>>>>
> > >>>>>> And I agree with the judgment about skipping a JS release artifact. 
> > >>>>>> Looks
> > >>>>>> like there hasn't been a code change since October so there's no 
> > >>>>>> point.
> > >>>>>>
> > >>>>>> Neal
> > >>>>>>
> > >>>>>> On Fri, Jul 17, 2020 at 10:37 AM Wes McKinney <wesmck...@gmail.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> I see the JS failures as well. I think it is a failure localized to
> > >>>>>>> newer Node versions since our JavaScript CI works fine. I don't 
> > >>>>>>> think
> > >>>>>>> it should block the release given the lack of development activity 
> > >>>>>>> in
> > >>>>>>> JavaScript [1] -- if any JS devs are concerned about publishing an
> > >>>>>>> artifact then we can skip pushing it to NPM
> > >>>>>>>
> > >>>>>>> @Ryan it seems it may be something environment related on your
> > >>>>>>> machine, I'm on Ubuntu 18.04 and have not seen this.
> > >>>>>>>
> > >>>>>>> On
> > >>>>>>>
> > >>>>>>>>  * Python 3.8 wheel's tests are failed. 3.5, 3.6 and 3.7
> > >>>>>>>>    are passed. It seems that -larrow and -larrow_python for
> > >>>>>>>>    Cython are failed.
> > >>>>>>>
> > >>>>>>> I suspect this is related to
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>> https://github.com/apache/arrow/commit/120c21f4bf66d2901b3a353a1f67bac3c3355924#diff-0f69784b44040448d17d0e4e8a641fe8
> > >>>>>>> ,
> > >>>>>>> but I don't think it's a blocking issue
> > >>>>>>>
> > >>>>>>> [1]: https://github.com/apache/arrow/commits/master/js
> > >>>>>>>
> > >>>>>>> On Fri, Jul 17, 2020 at 9:42 AM Ryan Murray <rym...@dremio.com> 
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> I've tested Java and it looks good. However the verify script keeps
> > >>>>> on
> > >>>>>>>> bailing with protobuf related errors:
> > >>>>>>>> 'cpp/build/orc_ep-prefix/src/orc_ep-build/c++/src/orc_proto.pb.cc'
> > >>>>> and
> > >>>>>>>> friends cant find protobuf definitions. A bit odd as cmake can see
> > >>>>>>> protobuf
> > >>>>>>>> headers and builds directly off master work just fine. Has anyone
> > >>>>> else
> > >>>>>>>> experienced this? I am on ubutnu 18.04
> > >>>>>>>>
> > >>>>>>>> On Fri, Jul 17, 2020 at 10:49 AM Antoine Pitrou 
> > >>>>>>>> <anto...@python.org>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> +1 (binding).  I tested on Ubuntu 18.04.
> > >>>>>>>>>
> > >>>>>>>>> * Wheels verification went fine.
> > >>>>>>>>> * Source verification went fine with CUDA enabled and
> > >>>>>>>>> TEST_INTEGRATION_JS=0 TEST_JS=0.
> > >>>>>>>>>
> > >>>>>>>>> I didn't test the binaries.
> > >>>>>>>>>
> > >>>>>>>>> Regards
> > >>>>>>>>>
> > >>>>>>>>> Antoine.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Le 17/07/2020 à 03:41, Krisztián Szűcs a écrit :
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I would like to propose the second release candidate (RC1) of
> > >>>>>> Apache
> > >>>>>>>>>> Arrow version 1.0.0.
> > >>>>>>>>>> This is a major release consisting of 826 resolved JIRA
> > >>>>> issues[1].
> > >>>>>>>>>>
> > >>>>>>>>>> The verification of the first release candidate (RC0) has failed
> > >>>>>>> [0], and
> > >>>>>>>>>> the packaging scripts were unable to produce two wheels. Compared
> > >>>>>>>>>> to RC0 this release candidate includes additional patches for the
> > >>>>>>>>>> following bugs: ARROW-9506, ARROW-9504, ARROW-9497,
> > >>>>>>>>>> ARROW-9500, ARROW-9499.
> > >>>>>>>>>>
> > >>>>>>>>>> This release candidate is based on commit:
> > >>>>>>>>>> bc0649541859095ee77d03a7b891ea8d6e2fd641 [2]
> > >>>>>>>>>>
> > >>>>>>>>>> The source release rc1 is hosted at [3].
> > >>>>>>>>>> The binary artifacts are hosted at [4][5][6][7].
> > >>>>>>>>>> The changelog is located at [8].
> > >>>>>>>>>>
> > >>>>>>>>>> Please download, verify checksums and signatures, run the unit
> > >>>>>> tests,
> > >>>>>>>>>> and vote on the release. See [9] for how to validate a release
> > >>>>>>> candidate.
> > >>>>>>>>>>
> > >>>>>>>>>> The vote will be open for at least 72 hours.
> > >>>>>>>>>>
> > >>>>>>>>>> [ ] +1 Release this as Apache Arrow 1.0.0
> > >>>>>>>>>> [ ] +0
> > >>>>>>>>>> [ ] -1 Do not release this as Apache Arrow 1.0.0 because...
> > >>>>>>>>>>
> > >>>>>>>>>> [0]:
> > >>>>>>> https://github.com/apache/arrow/pull/7778#issuecomment-659065370
> > >>>>>>>>>> [1]:
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%201.0.0
> > >>>>>>>>>> [2]:
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>> https://github.com/apache/arrow/tree/bc0649541859095ee77d03a7b891ea8d6e2fd641
> > >>>>>>>>>> [3]:
> > >>>>>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-1.0.0-rc1
> > >>>>>>>>>> [4]: https://bintray.com/apache/arrow/centos-rc/1.0.0-rc1
> > >>>>>>>>>> [5]: https://bintray.com/apache/arrow/debian-rc/1.0.0-rc1
> > >>>>>>>>>> [6]: https://bintray.com/apache/arrow/python-rc/1.0.0-rc1
> > >>>>>>>>>> [7]: https://bintray.com/apache/arrow/ubuntu-rc/1.0.0-rc1
> > >>>>>>>>>> [8]:
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>> https://github.com/apache/arrow/blob/bc0649541859095ee77d03a7b891ea8d6e2fd641/CHANGELOG.md
> > >>>>>>>>>> [9]:
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>

Reply via email to