[ 
https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649171#action_12649171
 ] 

Joydeep Sen Sarma commented on HIVE-30:
---------------------------------------

i have a broader concern about how many servers we will end up having and what 
the server represents. with the jdbc/hive-73 effort - seems like there's at 
least one more hive server. if the server manages state - then it doesn't make 
sense that there is more than one. with the hadoop analogy - there would seem 
to be one server (like the namenode) that would expose a jsp interface (in 
addition to other interfaces like jdbc/odbc)

we should also have one server side to manage common abstractions like userids 
and such. for example - we would find this patch unusable inside facebook since 
it does not set userids for hive queries - and this breaks the way we manage 
hadoop compute resources (we have fair sharing and compute quotas set up per 
userid) and hive tables (all tables will be created with same userid).

at a very fundamental level - it's not clear to me what the  'SHOW PROCESSLIST' 
equivalent even means for Hive. With namenode for example - we associate a set 
of data nodes. with jobtracker - we associate a set of compute resources. Hive 
does not control (as clearly) any resources. A Hive query brings together a 
(Hive) metadata server, a map-reduce instance, one or more dfs instances 
(tables/databases can span hdfs instances) and the client side compute 
resources required to run the query. A collection of hive queries (unlike a 
collection of mysql queries to the same mysql server) may not have much in 
common and hence the show processlist abstraction is not that meaningful (at 
least to me). 

that aside - comments on the patch itself - i am ok with the way configuration 
stuff is being used (looks like we are using hiveconf for the most part - just 
not for the hwi stuff), but:
- we seem to be initializing HiveConf for each show table/database - but it 
seems that one would need just one hiveconf per session and continue using that
- how are the logs going to be managed? logs for all sessions are going to the 
same server side log file. we should figure out a way to have the session id 
prepended to the log entries at least .. (for debugging)

> Hive web interface
> ------------------
>
>                 Key: HIVE-30
>                 URL: https://issues.apache.org/jira/browse/HIVE-30
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Jeff Hammerbacher
>            Assignee: Edward Capriolo
>            Priority: Minor
>         Attachments: HIVE-30.patch
>
>
> Hive needs a web interface. The initial checkin should have:
> * simple schema browsing
> * query submission
> * query history (similar to MySQL's SHOW PROCESSLIST)
> A suggested feature: the ability to have a query notify the user when it's 
> completed.
> Edward Capriolo has expressed some interest in driving this process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to