Hi Boris,

looks like bash_operator has same bug that ssh_execute_operator has, which
is it does not capture multi line output

I have put up the fix for bash_oeprator as well :
https://github.com/apache/incubator-airflow/pull/2026

please take a look.

Thanks
Jayesh






On Wed, Jan 25, 2017 at 1:25 PM, Boris Tyukin <[email protected]> wrote:

> I figured that luckily for me, the number of rows loaded by sqoop is
> reported to stdout as the very last line. So I just used BashOperator and
> set xcom_push=True. Then I did something like that:
>
>     # Log row_count ingested
>     try:
>         row_count = int(re.search('Retrieved (\d+) records',
>                                   kwargs['ti'].xcom_pull(task_
> ids='t_sqoop_from_cerner')).group(1))
>         write_job_audit(get_job_audit_id_from_context(kwargs),
> "rows_ingested_sqoop", row_count)
>     except ValueError:
>         write_job_audit(get_job_audit_id_from_context(kwargs),
> "rows_ingested_sqoop", -1)
>
> The alternative I was considering is to get mapreduce jobid and then use
> mapred command to get the needed counter - here is an example:
>
> mapred job -counter job_1484574566480_0002
> org.apache.hadoop.mapreduce.TaskCounter
> MAP_OUTPUT_RECORDS
>
> But I could not figure out an easy way to get job_id from BashOperator /
> sqoop output. I guess I could create my own operator that would capture all
> stdout lines not only the last one.
>
> On Tue, Jan 24, 2017 at 9:07 AM, Boris Tyukin <[email protected]>
> wrote:
>
> > Hello all,
> >
> > is there a way to capture sqoop counters either using bash or sqoop
> > operator? Specifically I need to pull a total number of rows loaded.
> >
> > By looking at bash operator, I think there is an option to push the last
> > line of output to xcom but sqoop and mapreduce output is a bit more
> > complicated.
> >
> > Thanks!
> >
>

Reply via email to