Using df.write.partitionBy is similar to a coarse-grained, clustered index
in a traditional database.  You can't use it on temporary tables, but it
will let you efficiently select small parts of a much larger table.

On Sat, Aug 13, 2016 at 11:13 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> Use a format that has built-in indexes, such as Parquet or Orc. Do not
> forget to sort the data on the columns that your filter on.
>
> On 14 Aug 2016, at 05:03, Taotao.Li <charles.up...@gmail.com> wrote:
>
>
> hi, guys, does Spark SQL support indexes?  if so, how can I create an
> index on my temp table? if not, how can I handle some specific queries on a
> very large table? it would iterate all the table even though all I want is
> just a small piece of that table.
>
> great thanks,
>
>
> *___________________*
> Quant | Engineer | Boy
> *___________________*
> *blog*:    http://litaotao.github.io
> <http://litaotao.github.io/?utm_source=spark_mail>
> *github*: www.github.com/litaotao
>
>
>

Reply via email to