I don't use the HiveCliHook so I'm not sure how it works, but is the only
place you can retrieve these counts the logfiles? If you have them at time
of query in your python callable, you could push them anywhere you like
inline at the conclusion of the task. Or, you may prefer to have your
PythonOperator `return` some data structure with those counts, which will
be stored by default in the airflow metadata database per the XCom system
<https://airflow.incubator.apache.org/concepts.html#xcoms>. Then, depending
what you want to do with that, you could query those out of the metadata
database with the ad-hoc querying or charting UIs right within Airflow, or
a later task altogether.

On Fri, Feb 10, 2017 at 8:58 AM, Boris Tyukin <[email protected]> wrote:

> please...?
>
> On Thu, Feb 9, 2017 at 8:35 AM, Boris Tyukin <[email protected]>
> wrote:
>
> > Hello,
> >
> > I am using HiveCliHook called from PythonOperator to run a series of
> > queries and want to capture record counts for auditing and validation
> > purposes.
> >
> > *I am thinking to use on_success_callback to invoke python function that
> > will read the log file, produced by airflow and then parse it out using
> > regex. *
> >
> > *I am going to use this method from models to get to the file log:*
> >
> > *def log_filepath(self): iso = self.execution_date.isoformat() log =
> > os.path.expanduser(configuration.get('core', 'BASE_LOG_FOLDER')) return
> (
> > "{log}/{self.dag_id}/{self.task_id}/{iso}.log".format(**locals()))*
> > Is this a good strategy or there is an easier way? I wondering if someone
> > did something similar.
> >
> > Another challenge is that the same log file contains multiple attempts
> and
> > reruns of the same task so I guess I need to parse the file backwards.
> >
> > thanks,
> > Boris
> >
>

Reply via email to