Hi Boris, looks like bash_operator has same bug that ssh_execute_operator has, which is it does not capture multi line output
I have put up the fix for bash_oeprator as well : https://github.com/apache/incubator-airflow/pull/2026 please take a look. Thanks Jayesh On Wed, Jan 25, 2017 at 1:25 PM, Boris Tyukin <[email protected]> wrote: > I figured that luckily for me, the number of rows loaded by sqoop is > reported to stdout as the very last line. So I just used BashOperator and > set xcom_push=True. Then I did something like that: > > # Log row_count ingested > try: > row_count = int(re.search('Retrieved (\d+) records', > kwargs['ti'].xcom_pull(task_ > ids='t_sqoop_from_cerner')).group(1)) > write_job_audit(get_job_audit_id_from_context(kwargs), > "rows_ingested_sqoop", row_count) > except ValueError: > write_job_audit(get_job_audit_id_from_context(kwargs), > "rows_ingested_sqoop", -1) > > The alternative I was considering is to get mapreduce jobid and then use > mapred command to get the needed counter - here is an example: > > mapred job -counter job_1484574566480_0002 > org.apache.hadoop.mapreduce.TaskCounter > MAP_OUTPUT_RECORDS > > But I could not figure out an easy way to get job_id from BashOperator / > sqoop output. I guess I could create my own operator that would capture all > stdout lines not only the last one. > > On Tue, Jan 24, 2017 at 9:07 AM, Boris Tyukin <[email protected]> > wrote: > > > Hello all, > > > > is there a way to capture sqoop counters either using bash or sqoop > > operator? Specifically I need to pull a total number of rows loaded. > > > > By looking at bash operator, I think there is an option to push the last > > line of output to xcom but sqoop and mapreduce output is a bit more > > complicated. > > > > Thanks! > > >
