I figured that luckily for me, the number of rows loaded by sqoop is
reported to stdout as the very last line. So I just used BashOperator and
set xcom_push=True. Then I did something like that:
# Log row_count ingested
try:
row_count = int(re.search('Retrieved (\d+) records',
kwargs['ti'].xcom_pull(task_
ids='t_sqoop_from_cerner')).group(1))
write_job_audit(get_job_audit_id_from_context(kwargs),
"rows_ingested_sqoop", row_count)
except ValueError:
write_job_audit(get_job_audit_id_from_context(kwargs),
"rows_ingested_sqoop", -1)
The alternative I was considering is to get mapreduce jobid and then use
mapred command to get the needed counter - here is an example:
mapred job -counter job_1484574566480_0002
org.apache.hadoop.mapreduce.TaskCounter
MAP_OUTPUT_RECORDS
But I could not figure out an easy way to get job_id from BashOperator /
sqoop output. I guess I could create my own operator that would capture all
stdout lines not only the last one.
On Tue, Jan 24, 2017 at 9:07 AM, Boris Tyukin <[email protected]> wrote:
> Hello all,
>
> is there a way to capture sqoop counters either using bash or sqoop
> operator? Specifically I need to pull a total number of rows loaded.
>
> By looking at bash operator, I think there is an option to push the last
> line of output to xcom but sqoop and mapreduce output is a bit more
> complicated.
>
> Thanks!
>