Oh cool.. I will try that out. Thanks, Ashutosh
On Wed, May 26, 2010 at 18:46, Ashish Thusoo <[email protected]> wrote: > Actually if you want to do that then I believe you can check in the post > execute hook that you have a valid write entity that is of the type table or > partition. You should have that only in the case of an insert or a CTAS. > > Ashish > > -----Original Message----- > From: Ashutosh Chauhan [mailto:[email protected]] > Sent: Wednesday, May 26, 2010 6:36 PM > To: [email protected] > Subject: Re: job level output committer in storage handler > > Thanks everyone for the reply. I think its HIVE-1225 is really what I want. > At this point I can implement PostExecute as I need to call the hook only at > the end of query and not at the end of each job or task of query. If I > register it through hive-site.xml then I guess it will get executed for each > query which is where the complication starts. I want to execute this hook > only for insert queries and not for all the queries. One workaround is to > get the Cmd string from session and then parse it to find out if it actually > is an insert query and only if it is then execute the remainder of code. > But that looks hacky, I look forward to HIVE-1225. > > Thanks, > Ashutosh > > On Wed, May 26, 2010 at 10:35, John Sichi <[email protected]> wrote: >> I think we'll need to extend the StorageHandler interface so that it can >> participate in the commit semantics (separate from the handler-independent >> hooks Ashish mentioned). That was the intention of this followup JIRA issue >> I logged as part of HBase integration work: >> >> https://issues.apache.org/jira/browse/HIVE-1225 >> >> To add this one, we need to determine what information needs to be passed >> along to the storage handler now (and how to make it easy to pass along more >> information as needed without having to change the interface in the future). >> >> JVS >> >> ________________________________________ >> From: Ning Zhang [[email protected]] >> Sent: Wednesday, May 26, 2010 10:22 AM >> To: [email protected] >> Subject: Re: job level output committer in storage handler >> >> Hi Ashutosh, >> >> Hive doesn't use OutputCommitter explicitly because it handles commit and >> abort by itself. >> >> If you are looking for task level committer where you want to do something >> after a task successfully finished, you can take a look at the >> FileSinkOperator.cloaseOp(). It renames tempFile to final file name which >> implement the commit semantics. >> >> If you are looking for job level committer where you want to do something >> after the job (including all task) finished successfully, you can take a >> look at the MoveTask implementation. The MoveTask is generated as a follow >> up task after a MR job for each insert overwrite statement. It moves the >> directory that contains the results from all finished tasks to its >> destination path (e.g. a directory specified in the insert statement or >> inferred from the table's storage location property). The MoveTask >> implements the commit semantics of the whole job. >> >> Ning >> >> On May 26, 2010, at 9:16 AM, Ashutosh Chauhan wrote: >> >>> Hi Kortni, >>> >>> Thanks for your suggestion. But we cant use it in our setup. We are >>> not spinning hive jobs in a separate process which we can monitor >>> rather I want to get the handle on when job finishes in my storage >>> handler / serde. >>> >>> Ashutosh >>> >>> On Tue, May 25, 2010 at 12:25, Kortni Smith <[email protected]> wrote: >>>> Hi Ashutosh , >>>> >>>> I'm not sure how to accomplish that on the hive side of things, but >>>> in case it helps I am writing because it sounds like you to know >>>> when your job is done so you can update something externally and my >>>> company will also be implementing this in the near future. Our plan >>>> is to have the process that kicks off our hive jobs in the cloud, to >>>> monitor each job status periodically using amazon's emr java >>>> library, and when their state changes to complete, update our external >>>> systems accordingly. >>>> >>>> >>>> Kortni Smith | Software Developer >>>> AbeBooks.com Passion for books. >>>> >>>> [email protected] >>>> phone: 250.412.3272 | fax: 250.475.6014 >>>> >>>> Suite 500 - 655 Tyee Rd. Victoria, BC. Canada V9A 6X5 >>>> >>>> www.abebooks.com | www.abebooks.co.uk | www.abebooks.de >>>> www.abebooks.fr | www.abebooks.it | www.iberlibro.com >>>> >>>> -----Original Message----- >>>> From: Ashutosh Chauhan [mailto:[email protected]] >>>> Sent: Tuesday, May 25, 2010 12:13 PM >>>> To: [email protected] >>>> Subject: job level output committer in storage handler >>>> >>>> Hi, >>>> >>>> I am implementing my own serde and storage handler. Is there any >>>> method in one of these interfaces (or any other) which give me a >>>> handle to do some operation after all the records have been written >>>> by all reducer. Something very similar to job level output >>>> committer. I want to update some state in an external system once I >>>> know job has completed successfully. Ideally, I would do this kind >>>> of a thing in a job level output committer, but since Hive is on old >>>> MR api, I dont have access to that. There is a Hive's >>>> RecordWriter#close() I tried that but it looks like its a task level >>>> handle. So, every reducer will try to update the state of my external >>>> system, which is not I want. >>>> Any pointers on how to achieve this will be much appreciated. If its >>>> unclear what I am asking for, let me know and I will provide more >>>> details. >>>> >>>> Thanks, >>>> Ashutosh >>>> >> >> >
