[
https://issues.apache.org/jira/browse/SPARK-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-12998.
---------------------------------
Resolution: Duplicate
Hi, [~rajesh.balamohan].
I'll close this issue since the PR is closed and the issue seems to be resolved
by another issue, SPARK-14070.
> Enable OrcRelation when connecting via spark thrift server
> ----------------------------------------------------------
>
> Key: SPARK-12998
> URL: https://issues.apache.org/jira/browse/SPARK-12998
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Rajesh Balamohan
>
> When a user connects via spark-thrift server to execute SQL, it does not
> enable PPD with ORC. It ends up creating MetastoreRelation which does not
> have ORC PPD. Purpose of this JIRA is to convert MetastoreRelation to
> OrcRelation in HiveMetastoreCatalog, so that users can benefit from PPD even
> when connecting to spark-thrift server.
> {noformat}
> For example, "explain select count(1) from tpch_flat_orc_1000.lineitem where
> l_shipdate = '1990-04-18'", current plan is
> +------------------------------------------------------------------------------------------------------------------+--+
> | plan
> |
> +------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==
> |
> | TungstenAggregate(key=[],
> functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L])
> |
> | +- Exchange SinglePartition, None
> |
> | +- WholeStageCodegen
> |
> | : +- TungstenAggregate(key=[],
> functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L]) |
> | : +- Project
> |
> | : +- Filter (l_shipdate#11 = 1990-04-18)
> |
> | : +- INPUT
> |
> | +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000,
> lineitem, None |
> +------------------------------------------------------------------------------------------------------------------+--+
> It would be good to change it to OrcRelation to do PPD with ORC, which
> reduces the runtime by large margin.
>
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |
> plan
> |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==
>
> |
> | TungstenAggregate(key=[],
> functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L])
>
> |
> | +- Exchange SinglePartition, None
>
> |
> | +- WholeStageCodegen
>
> |
> | : +- TungstenAggregate(key=[],
> functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L])
> |
> | : +- Project
>
> |
> | : +- Filter (_col10#64 = 1990-04-18)
>
> |
> | : +- INPUT
>
> |
> | +- Scan OrcRelation[_col10#64] InputPaths:
> hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters:
> [EqualTo(_col10,1990-04-18)] |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]