See the JIRA - this is too open-ended and not obviously just due to choices in data representation, what you're trying to do, etc. It's correctly closed IMHO. However, identifying the issue more narrowly, and something that looks ripe for optimization, would be useful.
On Thu, Oct 10, 2019 at 12:30 PM antonkulaga <antonkul...@gmail.com> wrote: > > I think for sure SPARK-28547 > <https://issues.apache.org/jira/projects/SPARK/issues/SPARK-28547> > At the moment there are some flows in Spark architecture and it performs > miserably or even freezes everywhere where column number exceeds 10-15K > (even simple describe function takes ages while the same functions with > pandas and no Spark take seconds). In many fields (like bioinformatics) wide > datasets with both large numbers of rows and columns are very common (gene > expression data is a good example here) and Spark is totally useless there. > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org