Hi Ashutosh, Hive doesn't use OutputCommitter explicitly because it handles commit and abort by itself.
If you are looking for task level committer where you want to do something after a task successfully finished, you can take a look at the FileSinkOperator.cloaseOp(). It renames tempFile to final file name which implement the commit semantics. If you are looking for job level committer where you want to do something after the job (including all task) finished successfully, you can take a look at the MoveTask implementation. The MoveTask is generated as a follow up task after a MR job for each insert overwrite statement. It moves the directory that contains the results from all finished tasks to its destination path (e.g. a directory specified in the insert statement or inferred from the table's storage location property). The MoveTask implements the commit semantics of the whole job. Ning On May 26, 2010, at 9:16 AM, Ashutosh Chauhan wrote: > Hi Kortni, > > Thanks for your suggestion. But we cant use it in our setup. We are > not spinning hive jobs in a separate process which we can monitor > rather I want to get the handle on when job finishes in my storage > handler / serde. > > Ashutosh > > On Tue, May 25, 2010 at 12:25, Kortni Smith <[email protected]> wrote: >> Hi Ashutosh , >> >> I'm not sure how to accomplish that on the hive side of things, but in case >> it helps I am writing because it sounds like you to know when your job is >> done so you can update something externally and my company will also be >> implementing this in the near future. Our plan is to have the process that >> kicks off our hive jobs in the cloud, to monitor each job status periodically >> using amazon's emr java library, and when their state changes to complete, >> update our external systems accordingly. >> >> >> Kortni Smith | Software Developer >> AbeBooks.com Passion for books. >> >> [email protected] >> phone: 250.412.3272 | fax: 250.475.6014 >> >> Suite 500 - 655 Tyee Rd. Victoria, BC. Canada V9A 6X5 >> >> www.abebooks.com | www.abebooks.co.uk | www.abebooks.de >> www.abebooks.fr | www.abebooks.it | www.iberlibro.com >> >> -----Original Message----- >> From: Ashutosh Chauhan [mailto:[email protected]] >> Sent: Tuesday, May 25, 2010 12:13 PM >> To: [email protected] >> Subject: job level output committer in storage handler >> >> Hi, >> >> I am implementing my own serde and storage handler. Is there any >> method in one of these interfaces (or any other) which give me a >> handle to do some operation after all the records have been written by >> all reducer. Something very similar to job level output committer. I >> want to update some state in an external system once I know job has >> completed successfully. Ideally, I would do this kind of a thing in a >> job level output committer, but since Hive is on old MR api, I dont >> have access to that. There is a Hive's RecordWriter#close() I tried >> that but it looks like its a task level handle. So, every reducer will >> try to update the state of my external system, which is not I want. >> Any pointers on how to achieve this will be much appreciated. If its >> unclear what I am asking for, let me know and I will provide more >> details. >> >> Thanks, >> Ashutosh >>
