[ 
https://issues.apache.org/jira/browse/SPARK-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-14070:
-------------------------------------
    Shepherd: Michael Armbrust

> Use ORC data source for SQL queries on ORC tables
> -------------------------------------------------
>
>                 Key: SPARK-14070
>                 URL: https://issues.apache.org/jira/browse/SPARK-14070
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Tejas Patil
>            Priority: Minor
>
> Currently if one is trying to query ORC tables in Hive, the plan generated by 
> Spark hows that its using the `HiveTableScan` operator which is generic to 
> all file formats. We could instead use the ORC data source for this so that 
> we can get ORC specific optimizations like predicate pushdown.
> Current behaviour:
> ```
> scala>  hqlContext.sql("SELECT * FROM orc_table").explain(true)
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*, None)]
> +- 'UnresolvedRelation `orc_table`, None
> == Analyzed Logical Plan ==
> key: string, value: string
> Project [key#171,value#172]
> +- MetastoreRelation default, orc_table, None
> == Optimized Logical Plan ==
> MetastoreRelation default, orc_table, None
> == Physical Plan ==
> HiveTableScan [key#171,value#172], MetastoreRelation default, orc_table, None
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to