It will probably eventually make its way into part of the query engine, one way or another. Note that there are in general a lot of other lower hanging fruits before you have to do vectorization.
As far as I know, Hive doesn't really have vectorization because the vectorization in Hive is simply writing everything in small batches, in order to avoid the virtual function call overhead, and hoping the JVM can unroll some of the loops. There is no SIMD involved. Something that is pretty useful, which isn't exactly from vectorization but comes from similar lines of research, is being able to push predicates down into the columnar compression encoding. For example, one can turn string comparisons into integer comparisons. These will probably give much larger performance improvements in common queries. On Mon, Jan 19, 2015 at 6:27 PM, Xuelin Cao <xuelincao2...@gmail.com> wrote: > Hi, > > Correct me if I were wrong. It looks like, the current version of > Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical > operator produces a tuple by recursively call child->execute . > > There are papers that illustrate the benefits of vectorized query > engine. And Hive-Stinger also embrace this style. > > So, the question is, will Spark-SQL give a support to vectorized query > execution someday? > > Thanks >