[
https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311878#comment-15311878
]
Xiao Li edited comment on SPARK-15691 at 6/2/16 7:30 AM:
---------------------------------------------------------
IMO, {{HiveMetastoreCatalog}} is the first piece of component we need to
refactor, but this is a very interesting part. Many concepts are mixed in the
same class: {{SparkSession}}, {{SessionState}}, {{DataSource}}, {{parser}},
Hive-specific {{analyzer rules}}, {{cache}}, {{MetastoreRelation}},
{{MetaStorePartitionedTableFileCatalog}} ... Still trying to split it in a
clean way.
was (Author: smilegator):
IMO, this is the first piece of component we need to refactor, but this is a
very interesting part. Many concepts are mixed in the same class:
{{SparkSession}}, {{SessionState}}, {{DataSource}}, {{parser}}, Hive-specific
{{analyzer rules}}, {{cache}}, {{MetastoreRelation}},
{{MetaStorePartitionedTableFileCatalog}} ... Still trying to split it in a
clean way.
> Refactor and improve Hive support
> ---------------------------------
>
> Key: SPARK-15691
> URL: https://issues.apache.org/jira/browse/SPARK-15691
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Reynold Xin
>
> Hive support is important to Spark SQL, as many Spark users use it to read
> from Hive. The current architecture is very difficult to maintain, and this
> ticket tracks progress towards getting us to a sane state.
> A number of things we want to accomplish are:
> - Move the Hive specific catalog logic into HiveExternalCatalog.
> -- Remove HiveSessionCatalog. All Hive-related stuff should go into
> HiveExternalCatalog. This would require moving caching either into
> HiveExternalCatalog, or just into SessionCatalog.
> -- Move using properties to store data source options into
> HiveExternalCatalog.
> -- Potentially more.
> - Remove HIve's specific ScriptTransform implementation and make it more
> general so we can put it in sql/core.
> - Implement HiveTableScan (and write path) as a data source, so we don't need
> a special planner rule for HiveTableScan.
> - Remove HiveSharedState and HiveSessionState.
> One thing that is still unclear to me is how to work with Hive UDF support.
> We might still need a special planner rule there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]