[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592831#comment-15592831 ]
Jason White commented on SPARK-10915: ------------------------------------- At the moment, we use .repartitionAndSortWithinPartitions to give us a strictly ordered iterable that we can process one at a time. We don't have a Python list sitting in memory, instead we rely on ExternalSort to order in a memory-safe way. I don't yet have enough experience with DataFrames to know if we will have the same or similar problems there. It's possible that collect_list will perform better - I'll give that a try when we get there and report back on this ticket if it's a suitable approach for our use case. > Add support for UDAFs in Python > ------------------------------- > > Key: SPARK-10915 > URL: https://issues.apache.org/jira/browse/SPARK-10915 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Reporter: Justin Uang > > This should support python defined lambdas. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org