How does HWI go about caching query results for others to view? Are the results durable given a bounce of HWI or are they held in memory?
We have a process where we build daily summaries from Hive queries that get emailed. Instead I'd like a way to persist/cache the query results on a server and build a custom AJAXy web UI to expose them. Just wondering if HWI could help with this... On Wed, Aug 26, 2009 at 5:46 PM, Edward Capriolo <[email protected]>wrote: > On Wed, Aug 26, 2009 at 7:31 PM, Bill Graham<[email protected]> wrote: > > The JDBC driver now is now able to integrate with some SQL desktop tools > for > > basic querying FYI. That still requires the user to know SQL, but at > least > > it doesn't require working on the command line. The SQuirrel SQL client > has > > been tested with the current JDBC release: > > > > > http://wiki.apache.org/hadoop/Hive/HiveJDBCInterface#head-98f2bc43312161b56e267773267546c080f4fb27 > > > > There's also ODBC driver work being done that has been tested with > > MicroStrategy, but it only supports Linux currently: > > > > http://wiki.apache.org/hadoop/Hive/HiveODBC > > > > On Wed, Aug 26, 2009 at 3:40 PM, Vijay <[email protected]> wrote: > >> > >> Having played around with hive cli/hiveserver/hwi for a few weeks I > think > >> I understand the various pieces better now. Can people provide some real > >> world scenarios where they use the different "modes?" > >> > >> As far as UI and more importantly making hive accessible to users that > are > >> not super familiar with SQL goes, it seems to me like hive JDBC might be > the > >> best option since that way hive can be relatively seamlessly integrated > with > >> many sophisticated reporting tools. I haven't explored that option much > yet. > >> > >> Hive cli was good enough for me to play around with the framework and I > >> can keep using it for real work. However, having a simple GUI like hwi > is > >> better for many reasons but I don't think it can ever be a replacement > for > >> all the available reporting tools. > >> > >> So, I guess I'm kind of conflicted at this point :) My ultimate goal is > to > >> put the power of hadoop and hive into the hands of non-technical > (business) > >> users. At first I thought I could probably build a simple UI (which is a > >> bunch of php files really) using the php thrift API but that API did not > >> seem suited for short lived web applications without some sort of > >> sophisticated session management. > >> > >> Any thoughts/ideas are greatly appreciated. > >> > >> On Wed, Aug 26, 2009 at 2:50 PM, Bill Graham <[email protected]> > wrote: > >>> > >>> +1 for the HWI -> HiveServer approach. > >>> > >>> Building out rich APIs in the HiveServer (thrift currently, and > possible > >>> REST at some point), would allow the HiveServer to focus on the > functional > >>> API. The HWI (and others) could then focus on rich UI functionality. > The two > >>> would have a clean decoupling, which would reduce complexity of the > >>> codebases and help abid by the KISS principle. > >>> > >>> > >>> > >>> On Wed, Aug 26, 2009 at 2:42 PM, Edward Capriolo < > [email protected]> > >>> wrote: > >>>> > >>>> On Wed, Aug 26, 2009 at 3:25 PM, Raghu Murthy<[email protected]> > >>>> wrote: > >>>> > Even if we decided to have multiple HiveServers, wouldn't it be > >>>> > possible for > >>>> > HWI to randomly pick a HiveServer to connect to per query/client? > >>>> > > >>>> > On 8/26/09 12:16 PM, "Ashish Thusoo" <[email protected]> wrote: > >>>> > > >>>> >> +1 for ajaxing this baby. > >>>> >> > >>>> >> On the broader question of whether we should combine HWI and > >>>> >> HiveServer - I > >>>> >> think there are definite deployment and code reuse advantages in > >>>> >> doing so, > >>>> >> however keeping them separate also has the advantage that we can > >>>> >> cluster > >>>> >> HiveServers independently from HWI. Since the HiveServer sits in > the > >>>> >> data > >>>> >> path, the independent scaling may have advantages. I am not sure > how > >>>> >> strong of > >>>> >> an argument that is to not put them together. Simplicity obviously > >>>> >> indicates > >>>> >> that we should have them together. > >>>> >> > >>>> >> Thoughts? > >>>> >> > >>>> >> Ashish > >>>> >> > >>>> >> -----Original Message----- > >>>> >> From: Edward Capriolo [mailto:[email protected]] > >>>> >> Sent: Wednesday, August 26, 2009 9:45 AM > >>>> >> To: [email protected] > >>>> >> Subject: Re: Adding jar files when running hive in hwi mode or > >>>> >> hiveserver mode > >>>> >> > >>>> >> On Tue, Aug 25, 2009 at 8:13 PM, Vijay<[email protected]> wrote: > >>>> >>> Yep, I got it and now it works perfectly! I like hwi btw! It > >>>> >>> definitely makes things easier for a wider audience to try out > hive. > >>>> >>> Your new session result bucket idea is very nice as well. I will > >>>> >>> keep > >>>> >>> trying more things and see if anything else comes up but so far it > >>>> >>> looks > >>>> >>> great! > >>>> >>> Thanks Edward! > >>>> >>> > >>>> >>> On Tue, Aug 25, 2009 at 7:25 AM, Edward Capriolo > >>>> >>> <[email protected]> > >>>> >>> wrote: > >>>> >>>> > >>>> >>>> On Tue, Aug 25, 2009 at 10:18 AM, Edward > >>>> >>>> Capriolo<[email protected]> > >>>> >>>> wrote: > >>>> >>>>> On Mon, Aug 24, 2009 at 10:13 PM, Vijay<[email protected]> > wrote: > >>>> >>>>>> Probably spoke too soon :) I added this comment to the JIRA > >>>> >>>>>> ticket > >>>> >>>>>> above. > >>>> >>>>>> > >>>> >>>>>> Hi, I tried the latest patch on trunk and there seems to be a > >>>> >>>>>> problem. > >>>> >>>>>> > >>>> >>>>>> I was interested in using the "add jar " command to add jar > files > >>>> >>>>>> to the path. However, by the time the command flows through the > >>>> >>>>>> SessionState to the AddResourceProcessor (in > >>>> >>>>>> > >>>> >>>>>> > >>>> >>>>>> > ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc > >>>> >>>>>> essor.java), the command word "add" is not being stripped so > the > >>>> >>>>>> resource processor is trying to find a ResourceType of "ADD." > >>>> >>>>>> > >>>> >>>>>> I'm not sure if this was an existing bug or was a result of the > >>>> >>>>>> current set of changes. > >>>> >>>>>> > >>>> >>>>>> [ Show > ] > >>>> >>>>>> Vijay added a comment - 24/Aug/09 07:12 PM Hi, I tried the > latest > >>>> >>>>>> patch on trunk and there seems to be a problem. I was > interested > >>>> >>>>>> in using the "add jar " command to add jar files to the path. > >>>> >>>>>> However, by the time the command flows through the SessionState > >>>> >>>>>> to > >>>> >>>>>> the AddResourceProcessor (in > >>>> >>>>>> > >>>> >>>>>> > >>>> >>>>>> > ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc > >>>> >>>>>> essor.java), the command word "add" is not being stripped so > the > >>>> >>>>>> resource processor is trying to find a ResourceType of "ADD." > I'm > >>>> >>>>>> not sure if this was an existing bug or was a result of the > >>>> >>>>>> current set of changes. > >>>> >>>>>> On Mon, Aug 24, 2009 at 5:30 PM, Vijay <[email protected]> > wrote: > >>>> >>>>>>> > >>>> >>>>>>> That's awesome and looks like exactly what I needed. Local > file > >>>> >>>>>>> system requirement is perfectly ok for now. I will check it > out > >>>> >>>>>>> right > >>>> >>>>>>> away! > >>>> >>>>>>> Hopefully it will be checked in soon. > >>>> >>>>>>> > >>>> >>>>>>> Thanks Edward! > >>>> >>>>>>> > >>>> >>>>>>> On Mon, Aug 24, 2009 at 5:14 PM, Edward Capriolo > >>>> >>>>>>> <[email protected]> > >>>> >>>>>>> wrote: > >>>> >>>>>>>> > >>>> >>>>>>>> On Mon, Aug 24, 2009 at 8:09 PM, Prasad > >>>> >>>>>>>> Chakka<[email protected]> > >>>> >>>>>>>> wrote: > >>>> >>>>>>>>> Vijay, there is no solution for it yet. There may be a jira > >>>> >>>>>>>>> open but AFAIK, no one is working on it. You are welcome to > >>>> >>>>>>>>> contribute this feature. > >>>> >>>>>>>>> > >>>> >>>>>>>>> Prasad > >>>> >>>>>>>>> > >>>> >>>>>>>>> > >>>> >>>>>>>>> ________________________________ > >>>> >>>>>>>>> From: Vijay <[email protected]> > >>>> >>>>>>>>> Reply-To: <[email protected]> > >>>> >>>>>>>>> Date: Mon, 24 Aug 2009 16:59:28 -0700 > >>>> >>>>>>>>> To: <[email protected]> > >>>> >>>>>>>>> Subject: Re: Adding jar files when running hive in hwi mode > or > >>>> >>>>>>>>> hiveserver mode > >>>> >>>>>>>>> > >>>> >>>>>>>>> Hi, is there any solution for this? How does everybody > include > >>>> >>>>>>>>> custom jar files running hive in a non-cli mode? > >>>> >>>>>>>>> > >>>> >>>>>>>>> Thanks in advance, > >>>> >>>>>>>>> Vijay > >>>> >>>>>>>>> > >>>> >>>>>>>>> On Sat, Aug 22, 2009 at 6:19 PM, Vijay <[email protected]> > >>>> >>>>>>>>> wrote: > >>>> >>>>>>>>> > >>>> >>>>>>>>> When I run hive in cli mode, I add the hive_contrib.jar file > >>>> >>>>>>>>> using this > >>>> >>>>>>>>> command: > >>>> >>>>>>>>> > >>>> >>>>>>>>> hive> add jar lib/hive_contrib.jar > >>>> >>>>>>>>> > >>>> >>>>>>>>> Is there a way to do this automatically when running hive in > >>>> >>>>>>>>> hwi or hiveserver modes? Or do I have to add the jar file > >>>> >>>>>>>>> explicitly to any of the startup scripts? > >>>> >>>>>>>>> > >>>> >>>>>>>>> > >>>> >>>>>>>>> > >>>> >>>>>>>> > >>>> >>>>>>>> Vijay, > >>>> >>>>>>>> > >>>> >>>>>>>> Currently HWI does not support this. The changes in > >>>> >>>>>>>> https://issues.apache.org/jira/browse/HIVE-716 will make > this > >>>> >>>>>>>> possible (although I did not test but it should work as the > cli > >>>> >>>>>>>> does). The file will have to be in the servers local file > >>>> >>>>>>>> system. We could probably include 'commons upload' to the web > >>>> >>>>>>>> interface if there was a need for it. > >>>> >>>>>>>> > >>>> >>>>>>>> HIVE-716 should be in trunk soon. It does apply cleanly if > its > >>>> >>>>>>>> something you need today, Edward > >>>> >>>>>>> > >>>> >>>>>> > >>>> >>>>>> > >>>> >>>>> > >>>> >>>>> I just committed a new version of the patch. You were correct, > the > >>>> >>>>> clidriver trims the first token off set and add queries hwi was > >>>> >>>>> not > >>>> >>>>> doing that. Also let me know your impressions of HWI. > >>>> >>>>> > >>>> >>>>> The new features are the 'ResultBucket' a buffer of the last x > >>>> >>>>> results viewable from the web interface, and the ability to > supply > >>>> >>>>> more then one query at a time. > >>>> >>>>> > >>>> >>>>> These two features should add much usability now as you can do > >>>> >>>>> things like explain, show tables, etc and not have to dump the > >>>> >>>>> results to a file. > >>>> >>>>> > >>>> >>>>> Edward > >>>> >>>>> > >>>> >>>> > >>>> >>>> False statement: > >>>> >>>>>> I just committed a new version of the patch > >>>> >>>> > >>>> >>>> In actuality, I updated the Jira with a new patch. > >>>> >>>> > >>>> >>>> It is still early AM. all the gears are not turning yet. > >>>> >>>> > >>>> >>>> Edward > >>>> >>> > >>>> >>> > >>>> >> > >>>> >> Vijay, > >>>> >> > >>>> >>>> It definitely makes things easier for a wider audience to try out > >>>> >>>> hive > >>>> >> > >>>> >> That was always the goal. I often wonder which direction we should > >>>> >> take HWI > >>>> >> in. > >>>> >> Should HWI have some REST-ful stubs to turn it into a remote job > >>>> >> submission > >>>> >> system? > >>>> >> HiveServer uses thrift and I believe thrift has an HTTP-Transport > so > >>>> >> you might > >>>> >> not need HWI to provide this. > >>>> >> > >>>> >> Should we ajax things like the result bucket or the entire > interface > >>>> >> so it has > >>>> >> that ooo aaahhh effect? > >>>> >> > >>>> >> Really the larger question HWI has it's own multi-session > management, > >>>> >> HiveServer has this as well (now way back when it did not) . Should > >>>> >> HWI just > >>>> >> front end HiveServer? > >>>> >> > >>>> >> Does anyone have any thoughts? > >>>> >> Edward > >>>> > > >>>> > > >>>> > >>>> I think Raghu is correct. HiveClient->HiveServer happens on a > >>>> permanent TCP connection (I think?). If you had a back end cluster of > >>>> HiveServers, and you had a load balancer or proxy with > >>>> sticky-session/session-tracking/source-ip policy. HWI would be > >>>> configured with the virtual IP address of the load balancer and would > >>>> connect and stay connected to a random HiveServer in the farm. > >>>> > >>>> I am naturally partial to the way it is now because I came up with it > :) > >>>> > >>>> I like the idea of having a REST-ful/XML-RPC or some web service style > >>>> interface for job submit. > >>>> > >>>> My thinking behind HWI has always been KISS. Keep It Simple Stupid. > >>>> Anyone should be able to hack a few web pages onto it. Adding thrift, > >>>> ajax, XML-RPC layers definitely ups the complexity. > >>>> > >>>> It think it makes sense to do HWI->HiveServer. I will have to take a > >>>> deeper look at what HiveServer and thrift offers to be sure. > >>>> > >>>> Edward > >>> > >> > > > > > > HWI had a couple of goals. if you are working with Hive and you need > multiple windows CLI windows that can be annoying. Also, a perk is > that you don't have to use screen or some other program, your session > is always detached and living on the server. Also the sessions are > "shareable", in that user bob can start a session and then someone > else can log in as him and see the results. > > Also to open a CLI client you need lots of network access hadoop, > metastore, etc. HWI you only need web access. > > Since you really cant interact with it problematically (see my mention > of RESTFul above) it would be hard to use with your data flow. I guess > in its current state you could do some web http post/get scripting but > that would be wonky. > > So in my mind it is used be developers to test queries, or if you had > a user that cant wait for a canned report, let them have at HWI. In > operations I can run a query against our web logs and then link > someone to the results if I find something interesting. >
