kirantodekar opened a new issue, #28958:
URL: https://github.com/apache/airflow/issues/28958
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
I am working on datahub project where I have created airflow dag to ingest
the metadata from hive source to datahub . I found the all the metadata is
migrating successfully but dag is marked as fail . I am providing the logs
that I am gettting from dag after fails .
**_Please check logs_**
ip-10-231-6-9.ec2.internal
*** Reading remote log from Cloudwatch log_group:
airflow-dt-airflow-prod-Task log_stream:
datahub_hive_ingest/hive_ingest/2023-01-14T06_00_00+00_00/1.log.
[2023-01-15 06:00:19,778] {{taskinstance.py:1035}} INFO - Dependencies all
met for <TaskInstance: datahub_hive_ingest.hive_ingest
scheduled__2023-01-14T06:00:00+00:00 [queued]>
[2023-01-15 06:00:19,794] {{taskinstance.py:1035}} INFO - Dependencies all
met for <TaskInstance: datahub_hive_ingest.hive_ingest
scheduled__2023-01-14T06:00:00+00:00 [queued]>
[2023-01-15 06:00:19,794] {{taskinstance.py:1241}} INFO -
--------------------------------------------------------------------------------
[2023-01-15 06:00:19,795] {{taskinstance.py:1242}} INFO - Starting attempt 1
of 2
[2023-01-15 06:00:19,795] {{taskinstance.py:1243}} INFO -
--------------------------------------------------------------------------------
[2023-01-15 06:00:19,860] {{taskinstance.py:1262}} INFO - Executing
<Task(BashOperator): hive_ingest> on 2023-01-14 06:00:00+00:00
[2023-01-15 06:00:19,864] {{standard_task_runner.py:52}} INFO - Started
process 327 to run task
[2023-01-15 06:00:19,877] {{standard_task_runner.py:76}} INFO - Running:
['airflow', 'tasks', 'run', 'datahub_hive_ingest', 'hive_ingest',
'scheduled__2023-01-14T06:00:00+00:00', '--job-id', '101045', '--raw',
'--subdir', 'DAGS_FOLDER/dt_datahub/pipelines/hive_metadata_dag.py',
'--cfg-path', '/tmp/tmpxz3djq70', '--error-file', '/tmp/tmp2oj9w7ye']
[2023-01-15 06:00:19,882] {{standard_task_runner.py:77}} INFO - Job 101045:
Subtask hive_ingest
[2023-01-15 06:00:20,059] {{logging_mixin.py:109}} INFO - Running
<TaskInstance: datahub_hive_ingest.hive_ingest
scheduled__2023-01-14T06:00:00+00:00 [running]> on host
ip-10-231-6-9.ec2.internal
[2023-01-15 06:00:20,146] {{taskinstance.py:1429}} INFO - Exporting the
following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=data-engineering
AIRFLOW_CTX_DAG_ID=datahub_hive_ingest
AIRFLOW_CTX_TASK_ID=hive_ingest
AIRFLOW_CTX_EXECUTION_DATE=2023-01-14T06:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-01-14T06:00:00+00:00
[2023-01-15 06:00:20,147] {{subprocess.py:62}} INFO - Tmp dir root location:
/tmp
[2023-01-15 06:00:20,148] {{subprocess.py:74}} INFO - Running command:
['bash', '-c', 'python3 -m datahub ingest -c
/usr/local/airflow/dags/dt_datahub/recipes/prod/Hive/hive.yaml']
[2023-01-15 06:00:20,160] {{subprocess.py:85}} INFO - Output:
[2023-01-15 06:00:21,550] {{subprocess.py:89}} INFO - [2023-01-15
06:00:21,549] INFO {datahub.cli.ingest_cli:179} - DataHub CLI version:
0.8.44
[2023-01-15 06:00:22,245] {{subprocess.py:89}} INFO - [2023-01-15
06:00:22,245] INFO {datahub.ingestion.run.pipeline:165} - Sink configured
successfully. DataHubRestEmitter: configured to talk to
https://datahub-gms.digitalturbine.com:8080
[2023-01-15 06:00:27,153] {{subprocess.py:89}} INFO - [2023-01-15
06:00:27,153] INFO {datahub.ingestion.source.sql.sql_common:284} - Applying
table_pattern {'deny': ['default.ap_south_1_events',
'default.eu_central_1_events', 'default.silver_2021_07_26',
'default.us_west_1_events', 'fyber_bi.revenue_over_time',
'fyber_bi_utils.demandstartdate_poc', 'lakehouse.bronze_pie',
'lakehouse_views.pie_matched_clicks', 'lakehouse_views.pie_matched_installs',
'lakehouse_views.pie_unmatched', 'default.get_ads_events',
'default.sa_east_1_events', 'lakehouse_reporting.source_event_fact_report',
'lakehouse_reporting.notification_revenue_report']} to view_pattern.
[2023-01-15 06:00:27,328] {{subprocess.py:89}} INFO - [2023-01-15
06:00:27,328] INFO {datahub.ingestion.run.pipeline:190} - Source configured
successfully.
[2023-01-15 06:00:27,329] {{subprocess.py:89}} INFO - [2023-01-15
06:00:27,329] INFO {datahub.cli.ingest_cli:126} - Starting metadata
ingestion
[2023-01-15 06:15:58,011] {{subprocess.py:89}} INFO - [2023-01-15
06:15:58,010] WARNING {py.warnings:110} -
/usr/local/airflow/.local/lib/python3.7/site-packages/avro/schema.py:1046:
IgnoredLogicalType: Logical type timestamp-millis requires literal type long,
not int.
[2023-01-15 06:15:58,011] {{subprocess.py:89}} INFO - logical_type,
"/".join(expected_types), type_)))
[2023-01-15 06:15:58,013] {{logging_mixin.py:109}} WARNING -
/usr/local/airflow/.local/lib/python3.7/site-packages/watchtower/__init__.py:349
WatchtowerWarning: Received empty message. Empty messages cannot be sent to
CloudWatch Logs
[2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING - Traceback (most
recent call last):
[2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING - File
"/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
self.sniff_errors(record)
[2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING - File
"/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
if pattern.search(record.message):
[2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING - AttributeError:
'LogRecord' object has no attribute 'message'
ip-10-231-6-9.ec2.internal
*** Reading remote log from Cloudwatch log_group:
airflow-dt-airflow-prod-Task log_stream:
datahub_hive_ingest/hive_ingest/2023-01-14T06_00_00+00_00/2.log.
[2023-01-15 08:15:23,735] {{taskinstance.py:1035}} INFO - Dependencies all
met for <TaskInstance: datahub_hive_ingest.hive_ingest
scheduled__2023-01-14T06:00:00+00:00 [queued]>
[2023-01-15 08:15:23,757] {{taskinstance.py:1035}} INFO - Dependencies all
met for <TaskInstance: datahub_hive_ingest.hive_ingest
scheduled__2023-01-14T06:00:00+00:00 [queued]>
[2023-01-15 08:15:23,757] {{taskinstance.py:1241}} INFO -
--------------------------------------------------------------------------------
[2023-01-15 08:15:23,757] {{taskinstance.py:1242}} INFO - Starting attempt 2
of 2
[2023-01-15 08:15:23,757] {{taskinstance.py:1243}} INFO -
--------------------------------------------------------------------------------
[2023-01-15 08:15:23,785] {{taskinstance.py:1262}} INFO - Executing
<Task(BashOperator): hive_ingest> on 2023-01-14 06:00:00+00:00
[2023-01-15 08:15:23,790] {{standard_task_runner.py:52}} INFO - Started
process 758 to run task
[2023-01-15 08:15:23,793] {{standard_task_runner.py:76}} INFO - Running:
['airflow', 'tasks', 'run', 'datahub_hive_ingest', 'hive_ingest',
'scheduled__2023-01-14T06:00:00+00:00', '--job-id', '101203', '--raw',
'--subdir', 'DAGS_FOLDER/dt_datahub/pipelines/hive_metadata_dag.py',
'--cfg-path', '/tmp/tmpcgt_enwk', '--error-file', '/tmp/tmp_ht93ip8']
[2023-01-15 08:15:23,794] {{standard_task_runner.py:77}} INFO - Job 101203:
Subtask hive_ingest
[2023-01-15 08:15:23,914] {{logging_mixin.py:109}} INFO - Running
<TaskInstance: datahub_hive_ingest.hive_ingest
scheduled__2023-01-14T06:00:00+00:00 [running]> on host
ip-10-231-6-9.ec2.internal
[2023-01-15 08:15:24,166] {{taskinstance.py:1429}} INFO - Exporting the
following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=data-engineering
AIRFLOW_CTX_DAG_ID=datahub_hive_ingest
AIRFLOW_CTX_TASK_ID=hive_ingest
AIRFLOW_CTX_EXECUTION_DATE=2023-01-14T06:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-01-14T06:00:00+00:00
[2023-01-15 08:15:24,167] {{subprocess.py:62}} INFO - Tmp dir root location:
/tmp
[2023-01-15 08:15:24,167] {{subprocess.py:74}} INFO - Running command:
['bash', '-c', 'python3 -m datahub ingest -c
/usr/local/airflow/dags/dt_datahub/recipes/prod/Hive/hive.yaml']
[2023-01-15 08:15:24,177] {{subprocess.py:85}} INFO - Output:
[2023-01-15 08:15:25,566] {{subprocess.py:89}} INFO - [2023-01-15
08:15:25,565] INFO {datahub.cli.ingest_cli:179} - DataHub CLI version:
0.8.44
[2023-01-15 08:15:26,418] {{subprocess.py:89}} INFO - [2023-01-15
08:15:26,418] INFO {datahub.ingestion.run.pipeline:165} - Sink configured
successfully. DataHubRestEmitter: configured to talk to
https://datahub-gms.digitalturbine.com:8080
[2023-01-15 08:15:28,663] {{subprocess.py:89}} INFO - [2023-01-15
08:15:28,663] INFO {datahub.ingestion.source.sql.sql_common:284} - Applying
table_pattern {'deny': ['default.ap_south_1_events',
'default.eu_central_1_events', 'default.silver_2021_07_26',
'default.us_west_1_events', 'fyber_bi.revenue_over_time',
'fyber_bi_utils.demandstartdate_poc', 'lakehouse.bronze_pie',
'lakehouse_views.pie_matched_clicks', 'lakehouse_views.pie_matched_installs',
'lakehouse_views.pie_unmatched', 'default.get_ads_events',
'default.sa_east_1_events', 'lakehouse_reporting.source_event_fact_report',
'lakehouse_reporting.notification_revenue_report']} to view_pattern.
[2023-01-15 08:15:28,843] {{subprocess.py:89}} INFO - [2023-01-15
08:15:28,843] INFO {datahub.ingestion.run.pipeline:190} - Source configured
successfully.
[2023-01-15 08:15:28,844] {{subprocess.py:89}} INFO - [2023-01-15
08:15:28,844] INFO {datahub.cli.ingest_cli:126} - Starting metadata
ingestion
[2023-01-15 08:31:08,087] {{subprocess.py:89}} INFO - [2023-01-15
08:31:08,087] WARNING {py.warnings:110} -
/usr/local/airflow/.local/lib/python3.7/site-packages/avro/schema.py:1046:
IgnoredLogicalType: Logical type timestamp-millis requires literal type long,
not int.
[2023-01-15 08:31:08,088] {{subprocess.py:89}} INFO - logical_type,
"/".join(expected_types), type_)))
[2023-01-15 08:31:08,090] {{logging_mixin.py:109}} WARNING -
/usr/local/airflow/.local/lib/python3.7/site-packages/watchtower/__init__.py:349
WatchtowerWarning: Received empty message. Empty messages cannot be sent to
CloudWatch Logs
[2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING - Traceback (most
recent call last):
[2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING - File
"/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
self.sniff_errors(record)
[2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING - File
"/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
if pattern.search(record.message):
[2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING - AttributeError:
'LogRecord' object has no attribute 'message'
[2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING - Traceback (most
recent call last):
[2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING - File
"/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
self.sniff_errors(record)
[2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING - File
"/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
if pattern.search(record.message):
[2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING - AttributeError:
'LogRecord' object has no attribute 'message'
[2023-01-15 08:38:56,537] {{subprocess.py:89}} INFO - Cli report:
[2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - {'cli_entry_location':
'/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/__init__.py',
[2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - 'cli_version':
'0.8.44',
[2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - 'os_details':
'Linux-4.14.296-222.539.amzn2.x86_64-x86_64-with-glibc2.2.5',
[2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - 'py_exec_path':
'/usr/bin/python3',
[2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - 'py_version': '3.7.15
(default, Oct 31 2022, 22:44:31) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-15)]'}
[2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - Source (hive) report:
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO - {'entities_profiled':
'0',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO - 'event_ids':
['agp_aggregations.agp_reporting_mapping_demand',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'development_sandbox.campaign_lookup_aggregation-subtypes',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'development_sandbox.push_success_dimension',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'container-subtypes-fyber_billing-urn:li:container:52f47c8aaea77f29c3bfcbba3cf37a1b',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'lakehouse_dimensions.dataai_product',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'container-urn:li:container:53515006d2dc79b714b61b8cccde7485-to-urn:li:dataset:(urn:li:dataPlatform:hive,lakehouse_dimensions.discovery_notifications_dimension,PROD)',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'lakehouse_dimensions.partner_information_dimension-subtypes',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'lakehouse_dimensions.subsite_dimension-subtypes',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'container-urn:li:container:228e29a11fe0b8191d9a69543ba4defd-to-urn:li:dataset:(urn:li:dataPlatform:hive,reporting_sandbox.activebase,PROD)',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -
'reporting_sandbox.att_84_nullkind_below_2020',
[2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO - '...
sampled of 1001 total elements'],
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO - 'events_produced':
'1001',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'events_produced_per_sec': '0',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO - 'failures': {},
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO - 'filtered':
['dataengineering_sandbox.*',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.ap_south_1_events',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.eu_central_1_events',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.get_ads_events',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.sa_east_1_events',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.silver_2021_07_26',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.silver_2021_07_26_1',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.silver_2021_07_26_2',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.silver_2021_07_26_3',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'default.us_west_1_events',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'fyber_bi.revenue_over_time',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'fyber_bi_utils.demandstartdate_poc',
[2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -
'lakehouse.bronze_pie',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'lakehouse_reporting.notification_revenue_report',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'lakehouse_reporting.source_event_fact_report',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'lakehouse_views.pie_matched_clicks',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'lakehouse_views.pie_matched_installs',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'lakehouse_views.pie_unmatched'],
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'running_time_in_seconds': '1407',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -
'soft_deleted_stale_entities': [],
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'start_time':
'2023-01-15 08:15:28.663948',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'tables_scanned':
'330',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'views_scanned': '0',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'warnings': {}}
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - Sink (datahub-rest)
report:
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - {'current_time':
'2023-01-15 08:38:56.540099',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'failures': [],
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'gms_version':
'v0.8.45',
[2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - 'pending_requests':
'1',
[2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -
'records_written_per_second': '0',
[2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO - 'start_time':
'2023-01-15 08:15:24.588129',
[2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -
'total_duration_in_seconds': '1411.95',
[2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -
'total_records_written': '999',
[2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO - 'warnings': []}
[2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO - ⏳ Pipeline running
successfully so far; produced 1001 events in 1407 seconds.
Let me know if you need any more info ..
### What you think should happen instead
_No response_
### How to reproduce
_No response_
### Operating System
aws cloud ..local ubuntu
### Versions of Apache Airflow Providers
_No response_
### Deployment
MWAA
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]