[ 
https://issues.apache.org/jira/browse/BEAM-10777?focusedWorklogId=473586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473586
 ]

ASF GitHub Bot logged work on BEAM-10777:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Aug/20 19:33
            Start Date: 22/Aug/20 19:33
    Worklog Time Spent: 10m 
      Work Description: saavannanavati commented on a change in pull request 
#12657:
URL: https://github.com/apache/beam/pull/12657#discussion_r475123198



##########
File path: 
website/www/site/content/en/blog/python-performance-runtime-type-checking.md
##########
@@ -0,0 +1,154 @@
+---
+layout: post
+title:  "Performance-Driven Runtime Type Checking for the Python SDK"
+date:   2020-08-21 00:00:01 -0800
+categories:
+  - blog 
+  - python 
+  - typing
+authors:
+  - saavan
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+In this blog post, we're announcing the upcoming release of a new, opt-in 
+runtime type checking system for Beam's Python SDK that's optimized for 
performance 
+in both development and production environments.
+
+But let's take a step back - why do we even care about runtime type checking 
+in the first place? Let's look at an example.
+
+```
+class MultiplyNumberByTwo(beam.DoFn):
+    def process(self, element: int):
+        return element * 2
+
+p = Pipeline()
+p | beam.Create(['1', '2'] | beam.ParDo(MultiplyNumberByTwo())
+```
+
+In this code, we passed a list of strings to a DoFn that's clearly intended 
for use with
+integers. Luckily, this code will throw an error during pipeline construction 
because
+the inferred output type of `beam.Create(['1', '2'])` is `str` which is 
incompatible with
+the declared input type hint of `MultiplyNumberByTwo.process` which is `int`.
+
+However, what if we turned the pipeline type check off using the 
`no_pipeline_type_check` 
+flag? Or more realistically, what if the input PCollection to 
MultiplyNumberByTwo came 
+from a database, preventing inference of the output data type?
+
+In either case, no error would be thrown during pipeline construction. 
+And even at runtime, this code works. Each string would be multiplied by 2, 
+yielding a result of `['11', '22']`, but that's certainly not the outcome we 
want.
+
+So how do you debug this breed of "hidden" errors? More broadly speaking, how 
do you
+debug any error message in Beam that's complex or confusing (e.g. 
serialization errors)?

Review comment:
       True true




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 473586)
    Time Spent: 1h  (was: 50m)

> Add blogs posts announcing updates to the type hints module of the Python SDK
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-10777
>                 URL: https://issues.apache.org/jira/browse/BEAM-10777
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Saavan Nanavati
>            Assignee: Saavan Nanavati
>            Priority: P2
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> There will be 2 new blog posts.
> 1. Announcing typed PCollections and support for type hint annotations on 
> PTransforms
> 2. Announcing the upcoming release of performance_runtime_type_check, a new 
> runtime type checking system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to