[jira] [Commented] (BEAM-11731) numpy 1.20.0 breaks dataframe io_test.test_read_write_10_parquet and mypy

Brian Hulette (Jira) Mon, 01 Feb 2021 14:23:07 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276705#comment-17276705
 ]


Brian Hulette commented on BEAM-11731:
--------------------------------------

To summarize, we seem to have two issues with numpy 1.2.0:
- Actually a pyarrow issue (ARROW-11450), this is the cause of the 
test_read_write_10_parquet failure
- numpy 1.20 deprecates some aliases for standard types 
(https://numpy.org/doc/1.20/release/1.20.0-notes.html?highlight=release%20notes#using-the-aliases-of-builtin-types-like-np-int-is-deprecated).
 This is the cause of the PreCommit Lint failure. We rely on these aliases in a 
couple of places, most notably in beam schemas. The release notes say they are 
deprecated, and not removed. Maybe this is just an issue with mypy warning us 
about using a deprecated object? The fact that this isn't accompanied by any 
test failures would indicate this one is not a serious problem for our users.

Actions I think we should take:
- Stop using np.bool, np.int, .. aliases (easy, not urgent)
- From now on use the next _minor_ version of numpy as our upper bound (i.e. 
<1.21.0) instead of the next major version. Releases aren't that frequent and 
we get a signal about these updates from the dependency check report.
- (If possible) restrict to numpy <1.20.0 when pyarrow <3.0 is used. pyarrow 
3.0 works with numpy 1.20.0, but pyarrow <3.0 does not, even though their numpy 
requirement allows it. If its possible we should make our setup.py work around 
this.
- Cherry-pick <1.20.0 requirement to 2.28.0 branch.
- Update 2.27.0 blog post with a known issue with numpy 1.20.0 and parquet.

> numpy 1.20.0 breaks dataframe io_test.test_read_write_10_parquet and mypy
> -------------------------------------------------------------------------
>
>                 Key: BEAM-11731
>                 URL: https://issues.apache.org/jira/browse/BEAM-11731
>             Project: Beam
>          Issue Type: Bug
>          Components: test-failures
>            Reporter: Kyle Weaver
>            Assignee: Brian Hulette
>            Priority: P0
>             Fix For: 2.28.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> pyarrow.lib.ArrowTypeError: ("Did not pass numpy.dtype object [while running 
> '_WriteToPandas/WriteToFiles/ParDo(_WriteUnshardedRecordsFn)/ParDo(_WriteUnshardedRecordsFn)']",
>  'Conversion failed for column rank with type int64')
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/17083/testReport/junit/apache_beam.dataframe.io_test/IOTest/test_read_write_10_parquet/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11731) numpy 1.20.0 breaks dataframe io_test.test_read_write_10_parquet and mypy

Reply via email to