Hi Ashutosh ,

I'm not sure how to accomplish that on the hive side of things, but in case
it helps I am writing because it sounds like you to know when your job is
done so you can update something externally and my company will also be
implementing this in the near future.  Our plan is to have the process that
kicks off our hive jobs in the cloud, to monitor each job status periodically
using amazon's emr java library, and when their state changes to complete,
update our external systems accordingly.


Kortni Smith | Software Developer
AbeBooks.com  Passion for books.

[email protected]
phone: 250.412.3272  |  fax: 250.475.6014

Suite 500 - 655 Tyee Rd. Victoria, BC. Canada V9A 6X5

www.abebooks.com  |  www.abebooks.co.uk  |  www.abebooks.de
www.abebooks.fr  |  www.abebooks.it  |  www.iberlibro.com

-----Original Message-----
From: Ashutosh Chauhan [mailto:[email protected]] 
Sent: Tuesday, May 25, 2010 12:13 PM
To: [email protected]
Subject: job level output committer in storage handler

Hi,

I am implementing my own serde and storage handler. Is there any
method in one of these interfaces (or any other) which give me a
handle to do some operation after all the records have been written by
all reducer.  Something very similar to job level output committer. I
want to update some state in an external system once I know job has
completed successfully. Ideally, I would do this kind of a thing in a
job level output committer, but since Hive is on old MR api, I dont
have access to that.  There is a Hive's RecordWriter#close() I tried
that but it looks like its a task level handle. So, every reducer will
try to update the state of my external system, which is not I want.
Any pointers on how to achieve this will be much appreciated. If its
unclear what I am asking for, let me know and I will provide more
details.

Thanks,
Ashutosh

Reply via email to