Does anybody have an idea? a clue? a hint? Thanks!
Renato M. 2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com>: > Hi all, > > I have a simple query "Select * from tableX where attribute1 between 0 and > 5" that I run over a Kryo file with four partitions that ends up being > around 3.5 million rows in our case. > If I run this query by doing a simple map().filter() it takes around ~9.6 > seconds but when I apply schema, register the table into a SqlContext, and > then run the query, it takes around ~16 seconds. This is using Spark 1.2.1 > with Scala 2.10.0 > I am wondering why there is such a big gap on performance if it is just a > filter. Internally, the relation files are mapped to a JavaBean. This > different data presentation (JavaBeans vs SparkSQL internal representation) > could lead to such difference? Is there anything I could do to make the > performance get closer to the "hard-coded" option? > Thanks in advance for any suggestions or ideas. > > > Renato M. >