kirantodekar opened a new issue, #28958:
URL: https://github.com/apache/airflow/issues/28958

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   I am working on datahub project where I have created airflow dag to ingest 
the metadata from hive source to datahub . I found the all the metadata is 
migrating successfully but dag is marked as fail .  I am providing the logs  
that I am gettting from dag after fails . 
   
   
   **_Please check logs_** 
   
   ip-10-231-6-9.ec2.internal
   *** Reading remote log from Cloudwatch log_group: 
airflow-dt-airflow-prod-Task log_stream: 
datahub_hive_ingest/hive_ingest/2023-01-14T06_00_00+00_00/1.log.
   [2023-01-15 06:00:19,778] {{taskinstance.py:1035}} INFO - Dependencies all 
met for <TaskInstance: datahub_hive_ingest.hive_ingest 
scheduled__2023-01-14T06:00:00+00:00 [queued]>
   [2023-01-15 06:00:19,794] {{taskinstance.py:1035}} INFO - Dependencies all 
met for <TaskInstance: datahub_hive_ingest.hive_ingest 
scheduled__2023-01-14T06:00:00+00:00 [queued]>
   [2023-01-15 06:00:19,794] {{taskinstance.py:1241}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-01-15 06:00:19,795] {{taskinstance.py:1242}} INFO - Starting attempt 1 
of 2
   [2023-01-15 06:00:19,795] {{taskinstance.py:1243}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-01-15 06:00:19,860] {{taskinstance.py:1262}} INFO - Executing 
<Task(BashOperator): hive_ingest> on 2023-01-14 06:00:00+00:00
   [2023-01-15 06:00:19,864] {{standard_task_runner.py:52}} INFO - Started 
process 327 to run task
   [2023-01-15 06:00:19,877] {{standard_task_runner.py:76}} INFO - Running: 
['airflow', 'tasks', 'run', 'datahub_hive_ingest', 'hive_ingest', 
'scheduled__2023-01-14T06:00:00+00:00', '--job-id', '101045', '--raw', 
'--subdir', 'DAGS_FOLDER/dt_datahub/pipelines/hive_metadata_dag.py', 
'--cfg-path', '/tmp/tmpxz3djq70', '--error-file', '/tmp/tmp2oj9w7ye']
   [2023-01-15 06:00:19,882] {{standard_task_runner.py:77}} INFO - Job 101045: 
Subtask hive_ingest
   [2023-01-15 06:00:20,059] {{logging_mixin.py:109}} INFO - Running 
<TaskInstance: datahub_hive_ingest.hive_ingest 
scheduled__2023-01-14T06:00:00+00:00 [running]> on host 
ip-10-231-6-9.ec2.internal
   [2023-01-15 06:00:20,146] {{taskinstance.py:1429}} INFO - Exporting the 
following env vars:
   [email protected]
   AIRFLOW_CTX_DAG_OWNER=data-engineering
   AIRFLOW_CTX_DAG_ID=datahub_hive_ingest
   AIRFLOW_CTX_TASK_ID=hive_ingest
   AIRFLOW_CTX_EXECUTION_DATE=2023-01-14T06:00:00+00:00
   AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-01-14T06:00:00+00:00
   [2023-01-15 06:00:20,147] {{subprocess.py:62}} INFO - Tmp dir root location: 
    /tmp
   [2023-01-15 06:00:20,148] {{subprocess.py:74}} INFO - Running command: 
['bash', '-c', 'python3 -m datahub ingest -c 
/usr/local/airflow/dags/dt_datahub/recipes/prod/Hive/hive.yaml']
   [2023-01-15 06:00:20,160] {{subprocess.py:85}} INFO - Output:
   [2023-01-15 06:00:21,550] {{subprocess.py:89}} INFO - [2023-01-15 
06:00:21,549] INFO     {datahub.cli.ingest_cli:179} - DataHub CLI version: 
0.8.44
   [2023-01-15 06:00:22,245] {{subprocess.py:89}} INFO - [2023-01-15 
06:00:22,245] INFO     {datahub.ingestion.run.pipeline:165} - Sink configured 
successfully. DataHubRestEmitter: configured to talk to 
https://datahub-gms.digitalturbine.com:8080
   [2023-01-15 06:00:27,153] {{subprocess.py:89}} INFO - [2023-01-15 
06:00:27,153] INFO     {datahub.ingestion.source.sql.sql_common:284} - Applying 
table_pattern {'deny': ['default.ap_south_1_events', 
'default.eu_central_1_events', 'default.silver_2021_07_26', 
'default.us_west_1_events', 'fyber_bi.revenue_over_time', 
'fyber_bi_utils.demandstartdate_poc', 'lakehouse.bronze_pie', 
'lakehouse_views.pie_matched_clicks', 'lakehouse_views.pie_matched_installs', 
'lakehouse_views.pie_unmatched', 'default.get_ads_events', 
'default.sa_east_1_events', 'lakehouse_reporting.source_event_fact_report', 
'lakehouse_reporting.notification_revenue_report']} to view_pattern.
   [2023-01-15 06:00:27,328] {{subprocess.py:89}} INFO - [2023-01-15 
06:00:27,328] INFO     {datahub.ingestion.run.pipeline:190} - Source configured 
successfully.
   [2023-01-15 06:00:27,329] {{subprocess.py:89}} INFO - [2023-01-15 
06:00:27,329] INFO     {datahub.cli.ingest_cli:126} - Starting metadata 
ingestion
   [2023-01-15 06:15:58,011] {{subprocess.py:89}} INFO - [2023-01-15 
06:15:58,010] WARNING  {py.warnings:110} - 
/usr/local/airflow/.local/lib/python3.7/site-packages/avro/schema.py:1046: 
IgnoredLogicalType: Logical type timestamp-millis requires literal type long, 
not int.
   [2023-01-15 06:15:58,011] {{subprocess.py:89}} INFO -   logical_type, 
"/".join(expected_types), type_)))
   [2023-01-15 06:15:58,013] {{logging_mixin.py:109}} WARNING - 
/usr/local/airflow/.local/lib/python3.7/site-packages/watchtower/__init__.py:349
 WatchtowerWarning: Received empty message. Empty messages cannot be sent to 
CloudWatch Logs
   [2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING - Traceback (most 
recent call last):
   [2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING -   File 
"/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
       self.sniff_errors(record)
   [2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING -   File 
"/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
       if pattern.search(record.message):
   [2023-01-15 06:15:58,014] {{logging_mixin.py:109}} WARNING - AttributeError: 
'LogRecord' object has no attribute 'message'
   
   ip-10-231-6-9.ec2.internal
   *** Reading remote log from Cloudwatch log_group: 
airflow-dt-airflow-prod-Task log_stream: 
datahub_hive_ingest/hive_ingest/2023-01-14T06_00_00+00_00/2.log.
   [2023-01-15 08:15:23,735] {{taskinstance.py:1035}} INFO - Dependencies all 
met for <TaskInstance: datahub_hive_ingest.hive_ingest 
scheduled__2023-01-14T06:00:00+00:00 [queued]>
   [2023-01-15 08:15:23,757] {{taskinstance.py:1035}} INFO - Dependencies all 
met for <TaskInstance: datahub_hive_ingest.hive_ingest 
scheduled__2023-01-14T06:00:00+00:00 [queued]>
   [2023-01-15 08:15:23,757] {{taskinstance.py:1241}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-01-15 08:15:23,757] {{taskinstance.py:1242}} INFO - Starting attempt 2 
of 2
   [2023-01-15 08:15:23,757] {{taskinstance.py:1243}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-01-15 08:15:23,785] {{taskinstance.py:1262}} INFO - Executing 
<Task(BashOperator): hive_ingest> on 2023-01-14 06:00:00+00:00
   [2023-01-15 08:15:23,790] {{standard_task_runner.py:52}} INFO - Started 
process 758 to run task
   [2023-01-15 08:15:23,793] {{standard_task_runner.py:76}} INFO - Running: 
['airflow', 'tasks', 'run', 'datahub_hive_ingest', 'hive_ingest', 
'scheduled__2023-01-14T06:00:00+00:00', '--job-id', '101203', '--raw', 
'--subdir', 'DAGS_FOLDER/dt_datahub/pipelines/hive_metadata_dag.py', 
'--cfg-path', '/tmp/tmpcgt_enwk', '--error-file', '/tmp/tmp_ht93ip8']
   [2023-01-15 08:15:23,794] {{standard_task_runner.py:77}} INFO - Job 101203: 
Subtask hive_ingest
   [2023-01-15 08:15:23,914] {{logging_mixin.py:109}} INFO - Running 
<TaskInstance: datahub_hive_ingest.hive_ingest 
scheduled__2023-01-14T06:00:00+00:00 [running]> on host 
ip-10-231-6-9.ec2.internal
   [2023-01-15 08:15:24,166] {{taskinstance.py:1429}} INFO - Exporting the 
following env vars:
   [email protected]
   AIRFLOW_CTX_DAG_OWNER=data-engineering
   AIRFLOW_CTX_DAG_ID=datahub_hive_ingest
   AIRFLOW_CTX_TASK_ID=hive_ingest
   AIRFLOW_CTX_EXECUTION_DATE=2023-01-14T06:00:00+00:00
   AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-01-14T06:00:00+00:00
   [2023-01-15 08:15:24,167] {{subprocess.py:62}} INFO - Tmp dir root location: 
    /tmp
   [2023-01-15 08:15:24,167] {{subprocess.py:74}} INFO - Running command: 
['bash', '-c', 'python3 -m datahub ingest -c 
/usr/local/airflow/dags/dt_datahub/recipes/prod/Hive/hive.yaml']
   [2023-01-15 08:15:24,177] {{subprocess.py:85}} INFO - Output:
   [2023-01-15 08:15:25,566] {{subprocess.py:89}} INFO - [2023-01-15 
08:15:25,565] INFO     {datahub.cli.ingest_cli:179} - DataHub CLI version: 
0.8.44
   [2023-01-15 08:15:26,418] {{subprocess.py:89}} INFO - [2023-01-15 
08:15:26,418] INFO     {datahub.ingestion.run.pipeline:165} - Sink configured 
successfully. DataHubRestEmitter: configured to talk to 
https://datahub-gms.digitalturbine.com:8080
   [2023-01-15 08:15:28,663] {{subprocess.py:89}} INFO - [2023-01-15 
08:15:28,663] INFO     {datahub.ingestion.source.sql.sql_common:284} - Applying 
table_pattern {'deny': ['default.ap_south_1_events', 
'default.eu_central_1_events', 'default.silver_2021_07_26', 
'default.us_west_1_events', 'fyber_bi.revenue_over_time', 
'fyber_bi_utils.demandstartdate_poc', 'lakehouse.bronze_pie', 
'lakehouse_views.pie_matched_clicks', 'lakehouse_views.pie_matched_installs', 
'lakehouse_views.pie_unmatched', 'default.get_ads_events', 
'default.sa_east_1_events', 'lakehouse_reporting.source_event_fact_report', 
'lakehouse_reporting.notification_revenue_report']} to view_pattern.
   [2023-01-15 08:15:28,843] {{subprocess.py:89}} INFO - [2023-01-15 
08:15:28,843] INFO     {datahub.ingestion.run.pipeline:190} - Source configured 
successfully.
   [2023-01-15 08:15:28,844] {{subprocess.py:89}} INFO - [2023-01-15 
08:15:28,844] INFO     {datahub.cli.ingest_cli:126} - Starting metadata 
ingestion
   [2023-01-15 08:31:08,087] {{subprocess.py:89}} INFO - [2023-01-15 
08:31:08,087] WARNING  {py.warnings:110} - 
/usr/local/airflow/.local/lib/python3.7/site-packages/avro/schema.py:1046: 
IgnoredLogicalType: Logical type timestamp-millis requires literal type long, 
not int.
   [2023-01-15 08:31:08,088] {{subprocess.py:89}} INFO -   logical_type, 
"/".join(expected_types), type_)))
   [2023-01-15 08:31:08,090] {{logging_mixin.py:109}} WARNING - 
/usr/local/airflow/.local/lib/python3.7/site-packages/watchtower/__init__.py:349
 WatchtowerWarning: Received empty message. Empty messages cannot be sent to 
CloudWatch Logs
   [2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING - Traceback (most 
recent call last):
   [2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING -   File 
"/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
       self.sniff_errors(record)
   [2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING -   File 
"/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
       if pattern.search(record.message):
   [2023-01-15 08:31:08,091] {{logging_mixin.py:109}} WARNING - AttributeError: 
'LogRecord' object has no attribute 'message'
   [2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING - Traceback (most 
recent call last):
   [2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING -   File 
"/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
       self.sniff_errors(record)
   [2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING -   File 
"/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
       if pattern.search(record.message):
   [2023-01-15 08:38:56,537] {{logging_mixin.py:109}} WARNING - AttributeError: 
'LogRecord' object has no attribute 'message'
   [2023-01-15 08:38:56,537] {{subprocess.py:89}} INFO - Cli report:
   [2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - {'cli_entry_location': 
'/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/__init__.py',
   [2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO -  'cli_version': 
'0.8.44',
   [2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO -  'os_details': 
'Linux-4.14.296-222.539.amzn2.x86_64-x86_64-with-glibc2.2.5',
   [2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO -  'py_exec_path': 
'/usr/bin/python3',
   [2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO -  'py_version': '3.7.15 
(default, Oct 31 2022, 22:44:31) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-15)]'}
   [2023-01-15 08:38:56,539] {{subprocess.py:89}} INFO - Source (hive) report:
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO - {'entities_profiled': 
'0',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -  'event_ids': 
['agp_aggregations.agp_reporting_mapping_demand',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'development_sandbox.campaign_lookup_aggregation-subtypes',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'development_sandbox.push_success_dimension',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'container-subtypes-fyber_billing-urn:li:container:52f47c8aaea77f29c3bfcbba3cf37a1b',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'lakehouse_dimensions.dataai_product',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'container-urn:li:container:53515006d2dc79b714b61b8cccde7485-to-urn:li:dataset:(urn:li:dataPlatform:hive,lakehouse_dimensions.discovery_notifications_dimension,PROD)',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'lakehouse_dimensions.partner_information_dimension-subtypes',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'lakehouse_dimensions.subsite_dimension-subtypes',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'container-urn:li:container:228e29a11fe0b8191d9a69543ba4defd-to-urn:li:dataset:(urn:li:dataPlatform:hive,reporting_sandbox.activebase,PROD)',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                
'reporting_sandbox.att_84_nullkind_below_2020',
   [2023-01-15 08:38:56,540] {{subprocess.py:89}} INFO -                '... 
sampled of 1001 total elements'],
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -  'events_produced': 
'1001',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -  
'events_produced_per_sec': '0',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -  'failures': {},
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -  'filtered': 
['dataengineering_sandbox.*',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.ap_south_1_events',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.eu_central_1_events',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.get_ads_events',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.sa_east_1_events',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.silver_2021_07_26',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.silver_2021_07_26_1',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.silver_2021_07_26_2',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.silver_2021_07_26_3',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'default.us_west_1_events',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'fyber_bi.revenue_over_time',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'fyber_bi_utils.demandstartdate_poc',
   [2023-01-15 08:38:56,541] {{subprocess.py:89}} INFO -               
'lakehouse.bronze_pie',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -               
'lakehouse_reporting.notification_revenue_report',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -               
'lakehouse_reporting.source_event_fact_report',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -               
'lakehouse_views.pie_matched_clicks',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -               
'lakehouse_views.pie_matched_installs',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -               
'lakehouse_views.pie_unmatched'],
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  
'running_time_in_seconds': '1407',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  
'soft_deleted_stale_entities': [],
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'start_time': 
'2023-01-15 08:15:28.663948',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'tables_scanned': 
'330',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'views_scanned': '0',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'warnings': {}}
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - Sink (datahub-rest) 
report:
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO - {'current_time': 
'2023-01-15 08:38:56.540099',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'failures': [],
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'gms_version': 
'v0.8.45',
   [2023-01-15 08:38:56,542] {{subprocess.py:89}} INFO -  'pending_requests': 
'1',
   [2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -  
'records_written_per_second': '0',
   [2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -  'start_time': 
'2023-01-15 08:15:24.588129',
   [2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -  
'total_duration_in_seconds': '1411.95',
   [2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -  
'total_records_written': '999',
   [2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO -  'warnings': []}
   [2023-01-15 08:38:56,543] {{subprocess.py:89}} INFO - ⏳ Pipeline running 
successfully so far; produced 1001 events in 1407 seconds.
   
   
   Let me know if you need any more info .. 
   
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   aws cloud ..local ubuntu 
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   MWAA
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to