[jira] [Commented] (SPARK-2106) Unify the HiveContext

Cheng Hao (JIRA) Wed, 11 Jun 2014 18:36:28 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028699#comment-14028699
 ]


Cheng Hao commented on SPARK-2106:
----------------------------------

Oh, I see your point.
Actually I was suggesting is we can make the HiveContext more generic, hence 
the other applications (SharkCLI) can use the HiveContext directly without 
creating the sub class of HiveContext to override things (SessionState creating 
etc.). 

In order to achieving that, we may need to:
# Provide contruction method & arguments(for example:SessionState, this can be 
also with default value) 
# Make the QueryExecution visible. (currently most of methods in 
HiveContext.QueryExecution are protected methods and they also tightly coupled)
# Refactor the API interfaces 
{code:title=HiveContext.QueryExecution|borderStyle=solid}
    runHive(cmd: String, ..): Seq[String] = {
...
    case _ =>
    sessionState.out.println(tokens(0) + " " + cmd_1)
    Seq(proc.run(cmd_1).getResponseCode.toString) // will the (errCode, 
Seq[String]) be more reasonable?
   ...
 }
{code}

Of course, this is in lower priority, actually the [Add Shark CLI 
Support|https://github.com/amplab/shark/pull/337] is done with creating the sub 
class of HiveContext.

> Unify the HiveContext
> ---------------------
>
>                 Key: SPARK-2106
>                 URL: https://issues.apache.org/jira/browse/SPARK-2106
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Cheng Hao
>
> I've been working on CLI for Catalyst, and from the CLI point of view, 
> HiveContext may be required some changes as:
> - SessionState management
> `SessionState` instance creation & initialization should be done within the 
> wrappers, e.g. in `SharkCliDriver` or `CLIService` etc. cause they know 
> better how to load the user configuration and logger redirection etc. And in 
> HiveContext, it can retrieve SessionState by calling `SessionState.get()`.
> - HiveContext API may not be enough for CLI
> 1) Retrieving the schema from the output of `SELECT`; but the internal class 
> `QueryExecution` is hidden for CLI.
> 2) Retriveing the HQL result in CLI, besides the string based result, CLI 
> also need to know the error code, as well as the call stack if exceptions 
> thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2106) Unify the HiveContext

Reply via email to