To be clear the plan is to drop them in Spark 3.1 onwards, yes? On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
> Hi all, > > I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 > at https://github.com/apache/spark/pull/28957. I assume people support it > in general > but I am writing this to make sure everybody is happy. > > Fokko made a very good investigation on it, see > https://github.com/apache/spark/pull/28957#issuecomment-652022449. > Assuming from the statistics, I think we're pretty safe to drop them. > Also note that dropping Python 2 was actually declared at > https://python3statement.org/ > > Roughly speaking, there are many main advantages by dropping them: > 1. It removes a bunch of hacks we added around 700 lines in PySpark. > 2. PyPy2 has a critical bug that causes a flaky test, > https://issues.apache.org/jira/browse/SPARK-28358 given my testing and > investigation. > 3. Users can use Python type hints with Pandas UDFs without thinking > about Python version > 4. Users can leverage one latest cloudpickle, > https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also > leverage C pickle. > 5. ... > > So it benefits both users and dev. WDYT guys? > > > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau