Re: SparkSQL performance

Renato Marroquín Mogrovejo Mon, 20 Apr 2015 15:05:33 -0700

Does anybody have an idea? a clue? a hint?
Thanks!


Renato M.

2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com>:

> Hi all,
>
> I have a simple query "Select * from tableX where attribute1 between 0 and
> 5" that I run over a Kryo file with four partitions that ends up being
> around 3.5 million rows in our case.
> If I run this query by doing a simple map().filter() it takes around ~9.6
> seconds but when I apply schema, register the table into a SqlContext, and
> then run the query, it takes around ~16 seconds. This is using Spark 1.2.1
> with Scala 2.10.0
> I am wondering why there is such a big gap on performance if it is just a
> filter. Internally, the relation files are mapped to a JavaBean. This
> different data presentation (JavaBeans vs SparkSQL internal representation)
> could lead to such difference? Is there anything I could do to make the
> performance get closer to the "hard-coded" option?
> Thanks in advance for any suggestions or ideas.
>
>
> Renato M.
>

Re: SparkSQL performance

Reply via email to