Re: Spark 3.0 preview release feature list and major changes

Sean Owen Thu, 10 Oct 2019 12:50:24 -0700

See the JIRA - this is too open-ended and not obviously just due to
choices in data representation, what you're trying to do, etc. It's
correctly closed IMHO.
However, identifying the issue more narrowly, and something that looks
ripe for optimization, would be useful.


On Thu, Oct 10, 2019 at 12:30 PM antonkulaga <antonkul...@gmail.com> wrote:
>
> I think for sure  SPARK-28547
> <https://issues.apache.org/jira/projects/SPARK/issues/SPARK-28547>
> At the moment there are some flows in Spark architecture and it performs
> miserably or even freezes everywhere where column number exceeds 10-15K
> (even simple describe function takes ages while the same functions with
> pandas and no Spark take seconds). In many fields (like bioinformatics) wide
> datasets with both large numbers of rows and columns are very common (gene
> expression data is a good example here) and Spark is totally useless there.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark 3.0 preview release feature list and major changes

Reply via email to