How does HWI go about caching query results for others to view? Are the
results durable given a bounce of HWI or are they held in memory?

We have a process where we build daily summaries from Hive queries that get
emailed. Instead I'd like a way to persist/cache the query results on a
server and build a custom AJAXy web UI to expose them. Just wondering if HWI
could help with this...


On Wed, Aug 26, 2009 at 5:46 PM, Edward Capriolo <[email protected]>wrote:

> On Wed, Aug 26, 2009 at 7:31 PM, Bill Graham<[email protected]> wrote:
> > The JDBC driver now is now able to integrate with some SQL desktop tools
> for
> > basic querying FYI. That still requires the user to know SQL, but at
> least
> > it doesn't require working on the command line. The SQuirrel SQL client
> has
> > been tested with the current JDBC release:
> >
> >
> http://wiki.apache.org/hadoop/Hive/HiveJDBCInterface#head-98f2bc43312161b56e267773267546c080f4fb27
> >
> > There's also ODBC driver work being done that has been tested with
> > MicroStrategy, but it only supports Linux currently:
> >
> > http://wiki.apache.org/hadoop/Hive/HiveODBC
> >
> > On Wed, Aug 26, 2009 at 3:40 PM, Vijay <[email protected]> wrote:
> >>
> >> Having played around with hive cli/hiveserver/hwi for a few weeks I
> think
> >> I understand the various pieces better now. Can people provide some real
> >> world scenarios where they use the different "modes?"
> >>
> >> As far as UI and more importantly making hive accessible to users that
> are
> >> not super familiar with SQL goes, it seems to me like hive JDBC might be
> the
> >> best option since that way hive can be relatively seamlessly integrated
> with
> >> many sophisticated reporting tools. I haven't explored that option much
> yet.
> >>
> >> Hive cli was good enough for me to play around with the framework and I
> >> can keep using it for real work. However, having a simple GUI like hwi
> is
> >> better for many reasons but I don't think it can ever be a replacement
> for
> >> all the available reporting tools.
> >>
> >> So, I guess I'm kind of conflicted at this point :) My ultimate goal is
> to
> >> put the power of hadoop and hive into the hands of non-technical
> (business)
> >> users. At first I thought I could probably build a simple UI (which is a
> >> bunch of php files really) using the php thrift API but that API did not
> >> seem suited for short lived web applications without some sort of
> >> sophisticated session management.
> >>
> >> Any thoughts/ideas are greatly appreciated.
> >>
> >> On Wed, Aug 26, 2009 at 2:50 PM, Bill Graham <[email protected]>
> wrote:
> >>>
> >>> +1 for the HWI -> HiveServer approach.
> >>>
> >>> Building out rich APIs in the HiveServer (thrift currently, and
> possible
> >>> REST at some point), would allow the HiveServer to focus on the
> functional
> >>> API. The HWI (and others) could then focus on rich UI functionality.
> The two
> >>> would have a clean decoupling, which would reduce complexity of the
> >>> codebases and help abid by the KISS principle.
> >>>
> >>>
> >>>
> >>> On Wed, Aug 26, 2009 at 2:42 PM, Edward Capriolo <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> On Wed, Aug 26, 2009 at 3:25 PM, Raghu Murthy<[email protected]>
> >>>> wrote:
> >>>> > Even if we decided to have multiple HiveServers, wouldn't it be
> >>>> > possible for
> >>>> > HWI to randomly pick a HiveServer to connect to per query/client?
> >>>> >
> >>>> > On 8/26/09 12:16 PM, "Ashish Thusoo" <[email protected]> wrote:
> >>>> >
> >>>> >> +1 for ajaxing this baby.
> >>>> >>
> >>>> >> On the broader question of whether we should combine HWI and
> >>>> >> HiveServer - I
> >>>> >> think there are definite deployment and code reuse advantages in
> >>>> >> doing so,
> >>>> >> however keeping them separate also has the advantage that we can
> >>>> >> cluster
> >>>> >> HiveServers independently from HWI. Since the HiveServer sits in
> the
> >>>> >> data
> >>>> >> path, the independent scaling may have advantages. I am not sure
> how
> >>>> >> strong of
> >>>> >> an argument that is to not put them together. Simplicity obviously
> >>>> >> indicates
> >>>> >> that we should have them together.
> >>>> >>
> >>>> >> Thoughts?
> >>>> >>
> >>>> >> Ashish
> >>>> >>
> >>>> >> -----Original Message-----
> >>>> >> From: Edward Capriolo [mailto:[email protected]]
> >>>> >> Sent: Wednesday, August 26, 2009 9:45 AM
> >>>> >> To: [email protected]
> >>>> >> Subject: Re: Adding jar files when running hive in hwi mode or
> >>>> >> hiveserver mode
> >>>> >>
> >>>> >> On Tue, Aug 25, 2009 at 8:13 PM, Vijay<[email protected]> wrote:
> >>>> >>> Yep, I got it and now it works perfectly! I like hwi btw! It
> >>>> >>> definitely makes things easier for a wider audience to try out
> hive.
> >>>> >>> Your new session result bucket idea is very nice as well. I will
> >>>> >>> keep
> >>>> >>> trying more things and see if anything else comes up but so far it
> >>>> >>> looks
> >>>> >>> great!
> >>>> >>> Thanks Edward!
> >>>> >>>
> >>>> >>> On Tue, Aug 25, 2009 at 7:25 AM, Edward Capriolo
> >>>> >>> <[email protected]>
> >>>> >>> wrote:
> >>>> >>>>
> >>>> >>>> On Tue, Aug 25, 2009 at 10:18 AM, Edward
> >>>> >>>> Capriolo<[email protected]>
> >>>> >>>> wrote:
> >>>> >>>>> On Mon, Aug 24, 2009 at 10:13 PM, Vijay<[email protected]>
> wrote:
> >>>> >>>>>> Probably spoke too soon :) I added this comment to the JIRA
> >>>> >>>>>> ticket
> >>>> >>>>>> above.
> >>>> >>>>>>
> >>>> >>>>>> Hi, I tried the latest patch on trunk and there seems to be a
> >>>> >>>>>> problem.
> >>>> >>>>>>
> >>>> >>>>>> I was interested in using the "add jar " command to add jar
> files
> >>>> >>>>>> to the path. However, by the time the command flows through the
> >>>> >>>>>> SessionState to the AddResourceProcessor (in
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>
> ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc
> >>>> >>>>>> essor.java), the command word "add" is not being stripped so
> the
> >>>> >>>>>> resource processor is trying to find a ResourceType of "ADD."
> >>>> >>>>>>
> >>>> >>>>>> I'm not sure if this was an existing bug or was a result of the
> >>>> >>>>>> current set of changes.
> >>>> >>>>>>
> >>>> >>>>>> [ Show > ]
> >>>> >>>>>> Vijay added a comment - 24/Aug/09 07:12 PM Hi, I tried the
> latest
> >>>> >>>>>> patch on trunk and there seems to be a problem. I was
> interested
> >>>> >>>>>> in using the "add jar " command to add jar files to the path.
> >>>> >>>>>> However, by the time the command flows through the SessionState
> >>>> >>>>>> to
> >>>> >>>>>> the AddResourceProcessor (in
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>
> ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc
> >>>> >>>>>> essor.java), the command word "add" is not being stripped so
> the
> >>>> >>>>>> resource processor is trying to find a ResourceType of "ADD."
> I'm
> >>>> >>>>>> not sure if this was an existing bug or was a result of the
> >>>> >>>>>> current set of changes.
> >>>> >>>>>> On Mon, Aug 24, 2009 at 5:30 PM, Vijay <[email protected]>
> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>> That's awesome and looks like exactly what I needed. Local
> file
> >>>> >>>>>>> system requirement is perfectly ok for now. I will check it
> out
> >>>> >>>>>>> right
> >>>> >>>>>>> away!
> >>>> >>>>>>> Hopefully it will be checked in soon.
> >>>> >>>>>>>
> >>>> >>>>>>> Thanks Edward!
> >>>> >>>>>>>
> >>>> >>>>>>> On Mon, Aug 24, 2009 at 5:14 PM, Edward Capriolo
> >>>> >>>>>>> <[email protected]>
> >>>> >>>>>>> wrote:
> >>>> >>>>>>>>
> >>>> >>>>>>>> On Mon, Aug 24, 2009 at 8:09 PM, Prasad
> >>>> >>>>>>>> Chakka<[email protected]>
> >>>> >>>>>>>> wrote:
> >>>> >>>>>>>>> Vijay, there is no solution for it yet. There may be a jira
> >>>> >>>>>>>>> open but AFAIK, no one is working on it. You are welcome to
> >>>> >>>>>>>>> contribute this feature.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Prasad
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> ________________________________
> >>>> >>>>>>>>> From: Vijay <[email protected]>
> >>>> >>>>>>>>> Reply-To: <[email protected]>
> >>>> >>>>>>>>> Date: Mon, 24 Aug 2009 16:59:28 -0700
> >>>> >>>>>>>>> To: <[email protected]>
> >>>> >>>>>>>>> Subject: Re: Adding jar files when running hive in hwi mode
> or
> >>>> >>>>>>>>> hiveserver mode
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Hi, is there any solution for this? How does everybody
> include
> >>>> >>>>>>>>> custom jar files running hive in a non-cli mode?
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks in advance,
> >>>> >>>>>>>>> Vijay
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On Sat, Aug 22, 2009 at 6:19 PM, Vijay <[email protected]>
> >>>> >>>>>>>>> wrote:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> When I run hive in cli mode, I add the hive_contrib.jar file
> >>>> >>>>>>>>> using this
> >>>> >>>>>>>>> command:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> hive> add jar lib/hive_contrib.jar
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Is there a way to do this automatically when running hive in
> >>>> >>>>>>>>> hwi or hiveserver modes? Or do I have to add the jar file
> >>>> >>>>>>>>> explicitly to any of the startup scripts?
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>> Vijay,
> >>>> >>>>>>>>
> >>>> >>>>>>>> Currently HWI does not support this. The changes in
> >>>> >>>>>>>> https://issues.apache.org/jira/browse/HIVE-716 will make
> this
> >>>> >>>>>>>> possible (although I did not test but it should work as the
> cli
> >>>> >>>>>>>> does). The file will have to be in the servers local file
> >>>> >>>>>>>> system. We could probably include 'commons upload' to the web
> >>>> >>>>>>>> interface if there was a need for it.
> >>>> >>>>>>>>
> >>>> >>>>>>>> HIVE-716 should be in trunk soon. It does apply cleanly if
> its
> >>>> >>>>>>>> something you need today, Edward
> >>>> >>>>>>>
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>
> >>>> >>>>> I just committed a new version of the patch. You were correct,
> the
> >>>> >>>>> clidriver trims the first token off set and add queries hwi was
> >>>> >>>>> not
> >>>> >>>>> doing that. Also let me know your impressions of HWI.
> >>>> >>>>>
> >>>> >>>>> The new features are the 'ResultBucket' a buffer of the last x
> >>>> >>>>> results viewable from the web interface, and the ability to
> supply
> >>>> >>>>> more then one query at a time.
> >>>> >>>>>
> >>>> >>>>> These two features should add much usability now as you can do
> >>>> >>>>> things like explain, show tables, etc and not have to dump the
> >>>> >>>>> results to a file.
> >>>> >>>>>
> >>>> >>>>> Edward
> >>>> >>>>>
> >>>> >>>>
> >>>> >>>> False statement:
> >>>> >>>>>> I just committed a new version of the patch
> >>>> >>>>
> >>>> >>>> In actuality, I updated the Jira with a new patch.
> >>>> >>>>
> >>>> >>>> It is still early AM. all the gears are not turning yet.
> >>>> >>>>
> >>>> >>>> Edward
> >>>> >>>
> >>>> >>>
> >>>> >>
> >>>> >> Vijay,
> >>>> >>
> >>>> >>>> It definitely makes things easier for a wider audience to try out
> >>>> >>>> hive
> >>>> >>
> >>>> >> That was always the goal. I often wonder which direction we should
> >>>> >> take HWI
> >>>> >> in.
> >>>> >> Should HWI have some REST-ful stubs to turn it into a remote job
> >>>> >> submission
> >>>> >> system?
> >>>> >> HiveServer uses thrift and I believe thrift has an HTTP-Transport
> so
> >>>> >> you might
> >>>> >> not need HWI to provide this.
> >>>> >>
> >>>> >> Should we ajax things like the result bucket or the entire
> interface
> >>>> >> so it has
> >>>> >> that ooo aaahhh effect?
> >>>> >>
> >>>> >> Really the larger question HWI has it's own multi-session
> management,
> >>>> >> HiveServer has this as well (now way back when it did not) . Should
> >>>> >> HWI just
> >>>> >> front end HiveServer?
> >>>> >>
> >>>> >> Does anyone have any thoughts?
> >>>> >> Edward
> >>>> >
> >>>> >
> >>>>
> >>>> I think Raghu is correct. HiveClient->HiveServer happens on a
> >>>> permanent TCP connection (I think?). If you had a back end cluster of
> >>>> HiveServers,  and you had a load balancer or proxy with
> >>>> sticky-session/session-tracking/source-ip policy. HWI would be
> >>>> configured with the virtual IP address of the load balancer and would
> >>>> connect and stay connected to a random HiveServer in the farm.
> >>>>
> >>>> I am naturally partial to the way it is now because I came up with it
> :)
> >>>>
> >>>> I like the idea of having a REST-ful/XML-RPC or some web service style
> >>>> interface for job submit.
> >>>>
> >>>> My thinking behind HWI has always been KISS. Keep It Simple Stupid.
> >>>> Anyone should be able to hack a few web pages onto it. Adding thrift,
> >>>> ajax, XML-RPC layers definitely ups the complexity.
> >>>>
> >>>> It think it makes sense to do HWI->HiveServer. I will have to take a
> >>>> deeper look at what HiveServer and thrift offers to be sure.
> >>>>
> >>>> Edward
> >>>
> >>
> >
> >
>
> HWI had a couple of goals. if you are working with Hive and you need
> multiple windows CLI windows that can be annoying. Also, a perk is
> that you don't have to use screen or some other program, your session
> is always detached and living on the server. Also the sessions are
> "shareable", in that user bob can start a session and then someone
> else can log in as him and see the results.
>
> Also to open a CLI client you need lots of network access hadoop,
> metastore, etc. HWI you only need web access.
>
> Since you really cant interact with it problematically (see my mention
> of RESTFul above) it would be hard to use with your data flow. I guess
> in its current state you could do some web http post/get scripting but
> that would be wonky.
>
> So in my mind it is used be developers to test queries, or if you had
> a user that cant wait for a canned report, let them have at HWI. In
> operations I can run a query against our web logs and then link
> someone to the results if I find something interesting.
>

Reply via email to