[ 
https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816337#comment-16816337
 ] 

Robert Bradshaw commented on BEAM-7060:
---------------------------------------

The static typing situation in Java has improved significantly from when this 
was originally added, both for Python 2.7 (e.g. backporting of the typing 
module) and 3.x (much better tooling, and standardization on typing module 
itself). We should definitely try to deprecate and remove our in-house 
typehints. 

There are three separable issues that need to be tackled here. 

(1) Users should be using the standard typing types for declaring their type 
hints (including, ideally, understanding function argument and return type 
annotations for Python 3). We already have code in place to understand and 
translate these to our internal types; which I think is now a subset of what is 
expressible in the typing module. That being the case, we should encourage all 
users to use these and deprecate our own from the public API.

(2) These types are only useful insofar as we do type compatibility checks. For 
example, we need to know that GroupByKey taking a Tuple[K, V] is applicable to 
a PCollection[str, int]. This is where the existing typing module is not 
enough, and we may need to take a dependence on a third-party library. If the 
third party library uses the typing module as its type definitions, the 
coupling here should be very loose and we can play around with different 
libraries (or even let it be pluggable). 

(3) Optional, but IMHO still desirable, is the ability to do type inference. We 
do type inference at two levels: the first (a) is transform-to-transform, e.g. 
when applying PTransform that takes PCollection[T] to PCollection[T, int] to a 
PCollection[string] we should know that the output is a PCollection[string, 
int]. This is pretty important, otherwise one will get many spurious type 
errors or very little actual checking. (E.g. consider the common case of 
GroupByKey, if I follow that with a DoFn expecting a PCollection[string, 
Iterable[int]) I don't want a type error when applying this DoFn to 
`pair_of_string_ints | GroupByKey()` but do want a type error when applying 
this to `pair_of_string_strings | GroupByKey()`.) 

(3b) The second type of inference we do, and the type that's (as I understand 
it) breaking with Python 3, is type inference of the bodies of process/map/... 
functions themselves. This is handy for tracing types through trivial functions 
like beam.Map(lambda x: (x, len(x))) and such, though it obviously has its 
limits. A third-party library would be needed here if we don't update our 
in-house system, and preferably one that works on runtime objects (i.e. 
bytecode) rather than source (numba probably has such libraries). 

> Design Py3-compatible typehints annotation support in Beam 3.
> -------------------------------------------------------------
>
>                 Key: BEAM-7060
>                 URL: https://issues.apache.org/jira/browse/BEAM-7060
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to