Using df.write.partitionBy is similar to a coarse-grained, clustered index in a traditional database. You can't use it on temporary tables, but it will let you efficiently select small parts of a much larger table.
On Sat, Aug 13, 2016 at 11:13 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Use a format that has built-in indexes, such as Parquet or Orc. Do not > forget to sort the data on the columns that your filter on. > > On 14 Aug 2016, at 05:03, Taotao.Li <charles.up...@gmail.com> wrote: > > > hi, guys, does Spark SQL support indexes? if so, how can I create an > index on my temp table? if not, how can I handle some specific queries on a > very large table? it would iterate all the table even though all I want is > just a small piece of that table. > > great thanks, > > > *___________________* > Quant | Engineer | Boy > *___________________* > *blog*: http://litaotao.github.io > <http://litaotao.github.io/?utm_source=spark_mail> > *github*: www.github.com/litaotao > > >