nsivabalan opened a new issue #4814:
URL: https://github.com/apache/hudi/issues/4814
**Describe the problem you faced**
My job fails w/ NPE during archival of data table when metadata table is
enabled.
Hi All, I am facing an issue regarding table type after bumping to 0.10.1
version.
I have specified COW table type but somehow in logs I can see MERGE_ON_READ
table type also. Is it for metadata table?
The jobs are intermittently failing with Null pointer exception.
Can someone please help me with this error. Attached stack trace and below
are the write options used
HoodieWriteConfig.TBL_NAME.key -> options.tableName,
DataSourceWriteOptions.TABLE_TYPE.key ->
DataSourceWriteOptions.TABLE_TYPE.defaultValue(),
DataSourceWriteOptions.RECORDKEY_FIELD.key -> options.primaryKey,
DataSourceWriteOptions.PRECOMBINE_FIELD.key -> RECKO_UPDATED_TIME,
DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS.key ->
classOf[MultiPartKeysValueExtractor].getName,
DataSourceWriteOptions.HIVE_STYLE_PARTITIONING.key -> "true",
DataSourceWriteOptions.PARTITIONPATH_FIELD.key ->
options.partitionCols.mkString(",")
**To Reproduce**
Steps to reproduce the behavior:
1.
2.
3.
4.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version :
* Spark version :
* Hive version :
* Hadoop version :
* Storage (HDFS/S3/GCS..) :
* Running on Docker? (yes/no) :
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```
2022-02-13 13:44:57,475 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,487 [main] INFO
org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties
2022-02-13 13:44:57,487 [main] INFO
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties'
for reading
2022-02-13 13:44:57,505 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,505 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit
timeline for
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,562 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
upto : Option{val=[==>20220213134444484__commit__REQUESTED]}
2022-02-13 13:44:57,562 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,574 [main] INFO
org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties
2022-02-13 13:44:57,574 [main] INFO
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties'
for reading
2022-02-13 13:44:57,592 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,592 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
2022-02-13 13:44:57,615 [main] INFO
org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties
2022-02-13 13:44:57,615 [main] INFO
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties'
for reading
2022-02-13 13:44:57,650 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of
type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
2022-02-13 13:44:57,650 [main] INFO
org.apache.hudi.common.table.view.FileSystemViewManager - Creating View
Manager with storage type :REMOTE_FIRST
2022-02-13 13:44:57,650 [main] INFO
org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote
first table view
2022-02-13 13:44:57,657 [dispatcher-BlockManagerMaster] INFO
org.apache.spark.storage.BlockManagerInfo - Removed broadcast_55_piece0 on
ip-10-9-157-234.ap-south-1.compute.internal:33735 in memory (size: 351.9 KiB,
free: 693.4 MiB)
2022-02-13 13:44:57,658 [dispatcher-BlockManagerMaster] INFO
org.apache.spark.storage.BlockManagerInfo - Removed broadcast_55_piece0 on
ip-10-9-130-208.ap-south-1.compute.internal:39427 in memory (size: 351.9 KiB,
free: 2004.5 MiB)
2022-02-13 13:44:57,658 [dispatcher-BlockManagerMaster] INFO
org.apache.spark.storage.BlockManagerInfo - Removed broadcast_55_piece0 on
ip-10-9-130-208.ap-south-1.compute.internal:37557 in memory (size: 351.9 KiB,
free: 2004.5 MiB)
2022-02-13 13:44:57,714 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
upto : Option{val=[==>20220213134444484__commit__REQUESTED]}
2022-02-13 13:44:57,715 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,728 [main] INFO
org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties
2022-02-13 13:44:57,728 [main] INFO
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/hoodie.properties'
for reading
2022-02-13 13:44:57,745 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/
2022-02-13 13:44:57,745 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
2022-02-13 13:44:57,766 [main] INFO
org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties
2022-02-13 13:44:57,766 [main] INFO
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem - Opening
's3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20/.hoodie/metadata/.hoodie/hoodie.properties'
for reading
2022-02-13 13:44:57,785 [main] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of
type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
s3://recko-ego/datalake_streamline_concurrency_1/structured_zone/validated_data/__recko__org_id=ego/__recko__tenant_id=ego/__recko__source=EXPECTED/__recko__policy_id=691508e5-9af8-4254-8211-3effed593c20//.hoodie/metadata
2022-02-13 13:44:57,841 [main] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants
upto : Option{val=[20220213134350974001__commit__COMPLETED]}
2022-02-13 13:44:57,841 [main] INFO
org.apache.hudi.table.HoodieTimelineArchiveLog - Limiting archiving of
instants to latest compaction on metadata table at 20220213134350974001
2022-02-13 13:44:57,842 [main] INFO
org.apache.hudi.client.heartbeat.HoodieHeartbeatClient - Stopping heartbeat
for instant 20220213134342038
2022-02-13 13:44:57,842 [main] INFO
org.apache.hudi.client.heartbeat.HoodieHeartbeatClient - Stopped heartbeat for
instant 20220213134342038
2022-02-13 13:44:57,907 [main] INFO
org.apache.hudi.client.heartbeat.HeartbeatUtils - Deleted the heartbeat for
instant 20220213134342038
2022-02-13 13:44:57,907 [main] INFO
org.apache.hudi.client.heartbeat.HoodieHeartbeatClient - Deleted heartbeat
file for instant 20220213134342038
2022-02-13 13:44:57,907 [main] INFO
org.apache.hudi.client.transaction.TransactionManager - Transaction ending
with transaction owner Option{val=[==>20220213134342038__commit__INFLIGHT]}
2022-02-13 13:44:57,907 [main] INFO
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider - RELEASING
lock atZkBasePath = /hudi, lock key = 691508e5-9af8-4254-8211-3effed593c20
2022-02-13 13:44:57,908 [main] INFO
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider - RELEASED
lock atZkBasePath = /hudi, lock key = 691508e5-9af8-4254-8211-3effed593c20
2022-02-13 13:44:57,908 [main] INFO
org.apache.hudi.client.transaction.TransactionManager - Transaction ended with
transaction owner Option{val=[==>20220213134342038__commit__INFLIGHT]}
2022-02-13 13:44:57,910 [main] INFO
com.recko.spark.jobs.v1.ingestion.DataValidationJob - Current State ::
================ FAILED
2022-02-13 13:44:57,912 [main] INFO
com.recko.spark.jobs.v1.ingestion.listeners.TaskStatusLogger - TimeStamp :
2022-02-13 13:44:57 =========== TaskId : 420a79fd-b810-47f7-ad28-dfcd7ea03c10
===== State : FAILED
2022-02-13 13:44:57,936 [main] INFO com.recko.spark.utils.db.RDSUtil$ -
Executing procedure :: INSERT INTO scepter.task_metrics(id, task_id, task_type,
stage_num, batch_id, policy_id, file_size, start_time, end_time, time_taken,
estimated_eta, other_attributes, org_id, tenant_id) VALUES (
'4cf1da7d-2cf9-4d72-8be2-a679797c5a88', '420a79fd-b810-47f7-ad28-dfcd7ea03c10',
'DATA_VALIDATION', '1', '1b5fbfd4-e793-4b7a-b896-f6b73e033000',
'691508e5-9af8-4254-8211-3effed593c20', 1, '2022-02-13 13:43:24.041',
'2022-02-13 13:44:57.913', 93872, 129663, '', 'ego', 'ego');
2022-02-13 13:44:57,938 [main] INFO com.recko.spark.utils.db.RDSUtil$ -
Execution complete, TimeTaken :: 25
2022-02-13 13:44:57,938 [main] INFO
org.apache.kafka.clients.producer.ProducerConfig - ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [ego-kafka.reckostaging.com:9094]
buffer.memory = 33554432
client.dns.lookup = use_all_dns_ips
client.id = producer-5
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = false
interceptor.classes = []
internal.auto.downgrade.txn.commit = false
key.serializer = class
org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class
org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = PLAIN
security.protocol = SASL_PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class
org.apache.kafka.common.serialization.StringSerializer
2022-02-13 13:44:57,941 [main] INFO
org.apache.kafka.common.security.authenticator.AbstractLogin - Successfully
logged in.
2022-02-13 13:44:57,942 [main] INFO
org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.6.0
2022-02-13 13:44:57,942 [main] INFO
org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 62abe01bee039651
2022-02-13 13:44:57,942 [main] INFO
org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1644759897942
2022-02-13 13:44:57,953 [kafka-producer-network-thread | producer-5] INFO
org.apache.kafka.clients.Metadata - [Producer clientId=producer-5] Cluster ID:
8XeF52xMSRmjjqZb4A0Ceg
2022-02-13 13:44:57,963 [main] INFO
com.recko.spark.utils.tasktracker.KafkaTaskTrackerService - recordMeta offset
= 6114
2022-02-13 13:44:57,963 [main] INFO
org.apache.kafka.clients.producer.KafkaProducer - [Producer
clientId=producer-5] Closing the Kafka producer with timeoutMillis =
9223372036854775807 ms.
2022-02-13 13:44:57,985 [main] INFO com.recko.spark.utils.S3Util$ -
cleaning up :
s3://recko-ego/datalake_streamline_concurrency_1/journal_duplicates/420a79fd-b810-47f7-ad28-dfcd7ea03c10/
2022-02-13 13:44:58,047 [main] INFO
com.recko.spark.services.ingester.RollbackService$ - Query :: ALTER TABLE
ego_ext.raw_journal DROP IF EXISTS PARTITION (org_id = 'ego', tenant_id =
'ego', data_source = 'EXPECTED', batch_id =
'1b5fbfd4-e793-4b7a-b896-f6b73e033000', task_id =
'420a79fd-b810-47f7-ad28-dfcd7ea03c10' );
2022-02-13 13:44:58,102 [main] INFO org.apache.spark.sql.hive.HiveUtils -
Initializing HiveMetastoreConnection version 3.1.2-amzn-5 using Spark classes.
2022-02-13 13:44:58,202 [main] INFO org.apache.hadoop.hive.conf.HiveConf -
Found configuration file
file:/mnt/yarn/usercache/hadoop/appcache/application_1642451881812_7589/container_1642451881812_7589_01_000001/hive-site.xml
2022-02-13 13:44:58,397 [main] INFO SessionState - Hive Session ID =
24800b67-fd35-4f59-b8c4-43956e2fdce1
2022-02-13 13:44:58,495 [main] INFO
org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory:
/tmp/hive/hadoop/24800b67-fd35-4f59-b8c4-43956e2fdce1
2022-02-13 13:44:58,502 [main] INFO
org.apache.hadoop.hive.ql.session.SessionState - Created local directory:
/tmp/yarn/24800b67-fd35-4f59-b8c4-43956e2fdce1
2022-02-13 13:44:58,503 [main] INFO
org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory:
/tmp/hive/hadoop/24800b67-fd35-4f59-b8c4-43956e2fdce1/_tmp_space.db
2022-02-13 13:44:58,517 [main] INFO
org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive
client (version 3.1.2) is hdfs:///user/spark/warehouse
2022-02-13 13:44:58,663 [dispatcher-CoarseGrainedScheduler] INFO
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint -
Registered executor NettyRpcEndpointRef(spark-client://Executor)
(10.9.157.234:45834) with ID 3, ResourceProfileId 0
2022-02-13 13:44:58,664 [spark-listener-group-executorManagement] INFO
org.apache.spark.scheduler.dynalloc.ExecutorMonitor - New executor 3 has
registered (new total is 4)
2022-02-13 13:44:58,767 [dispatcher-BlockManagerMaster] INFO
org.apache.spark.storage.BlockManagerMasterEndpoint - Registering block
manager ip-10-9-157-234.ap-south-1.compute.internal:38329 with 2004.6 MiB RAM,
BlockManagerId(3, ip-10-9-157-234.ap-south-1.compute.internal, 38329, None)
2022-02-13 13:45:00,299 [block-manager-storage-async-thread-pool-165] INFO
org.apache.spark.storage.BlockManager - Removing RDD 94
2022-02-13 13:45:00,315 [block-manager-storage-async-thread-pool-172] INFO
org.apache.spark.storage.BlockManager - Removing RDD 114
2022-02-13 13:45:00,319 [block-manager-storage-async-thread-pool-5] INFO
org.apache.spark.storage.BlockManager - Removing RDD 137
2022-02-13 13:45:00,322 [block-manager-storage-async-thread-pool-17] INFO
org.apache.spark.storage.BlockManager - Removing RDD 122
2022-02-13 13:45:02,507 [main] INFO
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessController
- Created SQLStdHiveAccessController for session context :
HiveAuthzSessionContext [sessionString=24800b67-fd35-4f59-b8c4-43956e2fdce1,
clientType=HIVECLI]
2022-02-13 13:45:02,509 [main] WARN
org.apache.hadoop.hive.ql.session.SessionState - METASTORE_FILTER_HOOK will be
ignored, since hive.security.authorization.manager is set to instance of
HiveAuthorizerFactory.
2022-02-13 13:45:03,707 [main] INFO
org.sparkproject.jetty.server.AbstractConnector - Stopped
Spark@60f662bd{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2022-02-13 13:45:03,710 [main] INFO org.apache.spark.ui.SparkUI - Stopped
Spark web UI at http://ip-10-9-157-234.ap-south-1.compute.internal:4040
2022-02-13 13:45:03,714 [YARN application state monitor] INFO
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Interrupting
monitor thread
2022-02-13 13:45:03,716 [main] INFO
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - Shutting down
all executors
2022-02-13 13:45:03,716 [dispatcher-CoarseGrainedScheduler] INFO
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint -
Asking each executor to shut down
2022-02-13 13:45:03,721 [main] INFO
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend - YARN client
scheduler backend Stopped
2022-02-13 13:45:03,750 [dispatcher-event-loop-2] INFO
org.apache.spark.MapOutputTrackerMasterEndpoint -
MapOutputTrackerMasterEndpoint stopped!
2022-02-13 13:45:03,764 [main] INFO
org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared
2022-02-13 13:45:03,764 [main] INFO org.apache.spark.storage.BlockManager
- BlockManager stopped
2022-02-13 13:45:03,772 [main] INFO
org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped
2022-02-13 13:45:03,778 [dispatcher-event-loop-3] INFO
org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint
- OutputCommitCoordinator stopped!
2022-02-13 13:45:03,784 [main] INFO org.apache.spark.SparkContext -
Successfully stopped SparkContext
<<< Invocation of Spark command completed <<<
Hadoop Job IDs executed by Spark: job_1642451881812_7593
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:412)
at
org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:54)
at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:225)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:219)
at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:143)
Caused by: java.lang.NullPointerException
at
org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$9(HoodieTimelineArchiveLog.java:230)
at
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
at
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
at
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
at
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at
org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124)
at
org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:766)
at
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:459)
at
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:198)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
at
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:635)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at com.recko.spark.io.lake.HudiIO.writeDataFrame(HudiIO.scala:66)
at
com.recko.spark.utils.datalake.StructuredZoneUtils$.putValidationData(StructuredZoneUtils.scala:135)
at
com.recko.spark.utils.datalake.LakeUtils$.writeIntoLake(LakeUtils.scala:10)
at
com.recko.spark.jobs.v1.ingestion.IngestionWrapper.postExecute(IngestionWrapper.scala:184)
at
com.recko.spark.jobs.v1.ingestion.DataValidationJob.execute(DataValidationJob.scala:63)
at
com.recko.spark.jobs.v1.triggers.IngestionJob$.$anonfun$main$1(IngestionJob.scala:30)
at com.recko.spark.executors.Executor.jobWrapper(Executor.scala:71)
at
com.recko.spark.jobs.v1.triggers.IngestionJob$.main(IngestionJob.scala:23)
at
com.recko.spark.jobs.v1.triggers.IngestionJob.main(IngestionJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:186)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:93)
at
org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:107)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:60)
... 16 more
Failing Oozie Launcher, null
java.lang.NullPointerException
at
org.apache.hudi.table.HoodieTimelineArchiveLog.lambda$getInstantsToArchive$9(HoodieTimelineArchiveLog.java:230)
at
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
at
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
at
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
at
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at
org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:124)
at
org.apache.hudi.client.AbstractHoodieWriteClient.archive(AbstractHoodieWriteClient.java:766)
at
org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:459)
at
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:198)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
at
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:635)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at com.recko.spark.io.lake.HudiIO.writeDataFrame(HudiIO.scala:66)
at
com.recko.spark.utils.datalake.StructuredZoneUtils$.putValidationData(StructuredZoneUtils.scala:135)
at
com.recko.spark.utils.datalake.LakeUtils$.writeIntoLake(LakeUtils.scala:10)
at
com.recko.spark.jobs.v1.ingestion.IngestionWrapper.postExecute(IngestionWrapper.scala:184)
at
com.recko.spark.jobs.v1.ingestion.DataValidationJob.execute(DataValidationJob.scala:63)
at
com.recko.spark.jobs.v1.triggers.IngestionJob$.$anonfun$main$1(IngestionJob.scala:30)
at com.recko.spark.executors.Executor.jobWrapper(Executor.scala:71)
at
com.recko.spark.jobs.v1.triggers.IngestionJob$.main(IngestionJob.scala:23)
at
com.recko.spark.jobs.v1.triggers.IngestionJob.main(IngestionJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:186)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:93)
at
org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:107)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:412)
at
org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:54)
at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:225)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:219)
at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:143)
Oozie Launcher, uploading action data to HDFS sequence file:
hdfs://qa-emr.reckostaging.com:8020/user/hadoop/oozie-oozi/0005119-220117203901194-oozie-oozi-W/IngestionJob--spark/action-data.seq
2022-02-13 13:45:03,825 [main] INFO org.apache.hadoop.io.compress.CodecPool
- Got brand-new compressor [.deflate]
Stopping AM
2022-02-13 13:45:03,837 [main] INFO
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Waiting for
application to be successfully unregistered.
Callback notification attempts left 0
Callback notification trying
http://ip-10-9-153-3.ap-south-1.compute.internal:11000/oozie/callback?id=0005119-220117203901194-oozie-oozi-W@IngestionJob&status=FAILED
Callback notification to
http://ip-10-9-153-3.ap-south-1.compute.internal:11000/oozie/callback?id=0005119-220117203901194-oozie-oozi-W@IngestionJob&status=FAILED
succeeded
Callback notification succeeded
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]