maheshguptags commented on issue #7589:
URL: https://github.com/apache/hudi/issues/7589#issuecomment-1379849852
@yihua thank you for above suggestion. I tried the run `CALL Procedure` by
pyspark(spark.sql) but it is throwing errors.
the problem i am facing, I am able to list the table from `spark catalog`
even from `default database` but when i call `call procedure` to `show
savepoint` it is not able to get that table.
you can check the code and error logs.
Attaching code snippet & error for same.
Error Screen shot

Code
```
from pyspark.sql import *
from datetime import datetime
import random
print("import module........")
spark = SparkSession.builder.appName("clustering on COR") \
.config("spark.jar", "hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar") \
.config("spark.jars.packages",
"org.apache.spark:spark-hadoop-cloud_2.12:3.3.0,"
"org.apache.hadoop:hadoop-aws:3.3.0,net.java.dev.jets3t:jets3t:0.9.4,"
"com.amazonaws:aws-java-sdk:1.12.303"
) \
.config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
.config("spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.hudi.catalog.HoodieCatalog") \
.config("spark.sql.extensions",
"org.apache.spark.sql.hudi.HoodieSparkSessionExtension") \
.config('spark.sql.parquet.enableVectorizedReader', 'false') \
.enableHiveSupport() \
.getOrCreate()
#
org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1,"org.apache.hudi:hudi-spark3.3-bundle_2.12:0.12.0"
sc = spark.sparkContext
print(f'access key and secret key is done')
sc._jsc.hadoopConfiguration().set('fs.s3a.access.key', "xxxxxxxxxxxxxxx")
sc._jsc.hadoopConfiguration().set('fs.s3a.secret.key', "xxxxxxxxxxxxxxxx")
from pyspark.sql import SQLContext
hsc = SQLContext(sc)
print(f'spark is {spark} and spark-context is {sc}')
print("============================================")
print("import of spark session is done !!!!")
print("============================================")
hsc.sql("use default")
df =
spark.read.format('org.apache.hudi').load("s3://test-spark-hudi/clustering_mor/")
df.createOrReplaceTempView("clustering_mor")
print('2',spark.sql("show tables from default").show())
print('===========================================================================')
print('=========================',spark.catalog.listTables())
print('===========================================================================')
df1 = spark.sql("select * from clustering_mor")
print(df1.show())
print(spark.sql("""call show_savepoints(table =>'clustering_mor')""").show())
```
Error:
```
:: loading settings :: url =
jar:file:/usr/lib/spark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/hadoop/.ivy2/cache
The jars for the packages stored in: /home/hadoop/.ivy2/jars
org.apache.spark#spark-streaming-kafka-0-10_2.12 added as a dependency
:: resolving dependencies ::
org.apache.spark#spark-submit-parent-5063ef7d-38d8-442b-bc98-ea87161dc6ae;1.0
confs: [default]
found org.apache.spark#spark-streaming-kafka-0-10_2.12;3.3.0 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.0 in
central
found org.apache.kafka#kafka-clients;2.8.1 in central
found org.lz4#lz4-java;1.8.0 in central
found org.xerial.snappy#snappy-java;1.1.8.4 in central
found org.slf4j#slf4j-api;1.7.32 in central
found org.apache.hadoop#hadoop-client-runtime;3.3.2 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hadoop#hadoop-client-api;3.3.2 in central
found commons-logging#commons-logging;1.1.3 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
:: resolution report :: resolve 466ms :: artifacts dl 15ms
:: modules in use:
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
org.apache.hadoop#hadoop-client-api;3.3.2 from central in [default]
org.apache.hadoop#hadoop-client-runtime;3.3.2 from central in [default]
org.apache.kafka#kafka-clients;2.8.1 from central in [default]
org.apache.spark#spark-streaming-kafka-0-10_2.12;3.3.0 from central in
[default]
org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.0 from
central in [default]
org.lz4#lz4-java;1.8.0 from central in [default]
org.slf4j#slf4j-api;1.7.32 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.xerial.snappy#snappy-java;1.1.8.4 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 11 | 0 | 0 | 0 || 11 | 0 |
---------------------------------------------------------------------
:: retrieving ::
org.apache.spark#spark-submit-parent-5063ef7d-38d8-442b-bc98-ea87161dc6ae
confs: [default]
0 artifacts copied, 11 already retrieved (0kB/9ms)
import module........
23/01/12 05:31:15 INFO SparkContext: Running Spark version 3.3.0-amzn-0
23/01/12 05:31:15 INFO ResourceUtils:
==============================================================
23/01/12 05:31:15 INFO ResourceUtils: No custom resources configured for
spark.driver.
23/01/12 05:31:15 INFO ResourceUtils:
==============================================================
23/01/12 05:31:15 INFO SparkContext: Submitted application: clustering on COR
23/01/12 05:31:15 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores,
23/01/12 05:31:15 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(hadoop); groups
with view permissions: Set(); users with modify permissions: Set(hadoop);
groups with modify permissions: Set()
23/01/12 05:31:16 INFO Utils: Successfully started service 'sparkDriver' on
port 46779.
23/01/12 05:31:16 INFO SparkEnv: Registering MapOutputTracker
23/01/12 05:31:16 INFO SparkEnv: Registering BlockManagerMaster
23/01/12 05:31:16 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/01/12 05:31:16 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
23/01/12 05:31:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/01/12 05:31:16 INFO DiskBlockManager: Created local directory at
/mnt/tmp/blockmgr-f4403214-599d-49b0-bba0-fd2136128568
23/01/12 05:31.compute.internal/10.224.51.200:8032
23/01/12 05:31:17 INFO Configuration: resource-types.xml not found
23/01/12 05:31:17 INFO ResourceUtils: Unable to find 'resource-types.xml'.
23/01/12 05:31:17 INFO Client: Verifying our application has not requested
more than the maximum memory capability of the cluster (12288 MB per container)
23/01/12 05:31:17 INFO Client: Will allocate AM container, with 896 MB
memory including 384 MB overhead
23/01/12 05:31:17 INFO Client: Setting up container launch context for our AM
23/01/12 05:31:17 INFO Client: Setting up the launch environment for our AM
container
23/01/12 05:31:17 INFO Client: Preparing resources for our AM container
23/01/12 05:31:17 WARN Client: Neither spark.yarn.jars nor
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
23/01/12 05:31:20 INFO Client: Uploading resource
file:/mnt/tmp/spark-76ce116e-57b6-4ab0-9b19-40220d2d67c3/__spark_libs__4646235866736797652.zip
->
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1671433588099_0213/__spark_libs__4646235866736797652.zip
23/01/12 05:31:21 INFO Client: Uploading resource
file:/usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar ->
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1671433588099_0213/hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar
23/01/12 05:31:21 INFO Client: Uploading resource
file:/home/hadoop/.ivy2/jars/org.apache.spark_spark-streaming-kafka-0-10_2.12-3.3.0.jar
->
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1671433588099_0213/org.apache.spark_spark-streaming-kafka-0-10_2.12-3.3.0.jar
23/01/12 05:31:21 INFO Client: Uploading resource
file:/home/hadoop/.ivy2/jars/org.apache.hadoop_hadoop-client-runtime-3.3.2.jar
->
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1671433588099_0213/org.apache.hadoop_hadoop-client-runtime-3.3.2.jar
23/01/12 05:31:21 INFO Client: Uploading resource
file:/home/hadoop/.ivy2/jars/org.lz4_lz4-java-1.8.0.jar ->
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1671433588099_0213/org.lz4_lz4-java-1.8.0.jar
23/01/12 05:31:21 INFO Client: Uploading resource
file:/home/hadoop/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.8.4.jar ->
hdfs://ip-10-224-51-200.ap-south-
23/01/12 05:31:21 INFO Client: Uploading resource
file:/mnt/tmp/spark-76ce116e-57b6-4ab0-9b19-40220d2d67c3/__spark_conf__2936504612168698213.zip
->
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1671433588099_0213/__spark_conf__.zip
23/01/12 05:31:21 INFO SecurityManager: Changing view acls to: hadoop
23/01/12 05:31:21 INFO SecurityManager: Changing modify acls to: hadoop
23/01/12 05:31:21 INFO SecurityManager: Changing view acls groups to:
23/01/12 05:31:21 INFO SecurityManager: Changing modify acls groups to:
23/01/12 05:31:21 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(hadoop); groups
with view permissions: Set(); users with modify permissions: Set(hadoop);
groups with modify permissions: Set()
23/01/12 05:31:21 INFO Client: Submitting application
application_1671433588099_0213 to ResourceManager
23/01/12 05:31:21 INFO YarnClientImpl: Submitted application
application_1671433588099_0213
23/01/12 05:31:22 INFO Client: Application report for
application_1671433588099_0213 (state: ACCEPTED)
23/01/12 05:31:22 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to
Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1673501481664
final status: UNDEFINED
tracking URL:
http://ip-10-224-51-200.ap-south-1.compute.internal:20888/proxy/application_1671433588099_0213/
user: hadoop
23/01/12 05:31:23 INFO Client: Application report for
application_1671433588099_0213 (state: ACCEPTED)
23/01/12 05:31:26 INFO SingleEventLogFileWriter: Logging events to
hdfs:/var/log/spark/apps/application_1671433588099_0213.inprogress
23/01/12 05:31:27 INFO Utils: Using 50 preallocated executors (minExecutors:
0). Set spark.dynamicAllocation.preallocateExecutors to `false` disable
executor preallocation.
23/01/12 05:31:27 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted
to request executors before the AM has registered!
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /jobs:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /jobs/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /jobs/job:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /jobs/job/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /stages:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /stages/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /stages/stage:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /stages/stage/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /stages/pool:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /stages/pool/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /storage:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /storage/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /storage/rdd:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /storage/rdd/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /environment:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /environment/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /executors:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /executors/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /executors/threadDump:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /executors/thr
23/01/12 05:31:27 INFO ServerInfo: Adding filter to /metrics/json:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
23/01/12 05:31:27 INFO YarnClientSchedulerBackend: SchedulerBackend is ready
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
23/01/12 05:31:27 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:
ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
access key and secret key is done
/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py:114:
FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate()
instead.
spark is <pyspark.sql.session.SparkSession object at 0x7f3c169f0750> and
spark-context is <SparkContext master=yarn appName=clustering on COR>
============================================
import of spark session is done !!!!
============================================
23/01/12 05:31:27 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir.
23/01/12 05:31:27 INFO SharedState: Warehouse path is 'hdfs://ip- Found
configuration file file:/etc/spark/conf.dist/hive-site.xml
23/01/12 05:31:31 WARN HiveConf: HiveConf of name hive.server2.thrift.url
does not exist
23/01/12 05:31:31 INFO HiveClientImpl: Warehouse location for Hive client
(version 2.3.9) is
hdfs://ip-10-224-51-200.ap-south-1.compute.internal:8020/user/spark/warehouse
23/01/12 05:31:31 INFO metastore: Trying to connect to metastore with URI
thrift://ip-10-224-51-200.ap-south-1.compute.internal:9083
23/01/12 05:31:31 INFO metastore: Opened a connection to metastore, current
connections: 1
23/01/12 05:31:31 INFO metastore: Connected to metastore.
23/01/12 05:31:32 INFO ClientConfigurationFactory: Set initial getObject
socket timeout to 2000 ms.
23/01/12 05:31:33 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered
executor NettyRpcEndpointRef(spark-client://Executor) (10.224.51.220:51532)
with ID 1, ResourceProfileId 0
23/01/12 05:31:33 INFO ExecutorMonitor: New executor 1 has registered (new
total is 2)
23/01/12 05:31:33 INFO BlockManagerMasterEndpoint: Registering block manager
ip-10-224-51-220.ap-south-1.compute.internal:38397 with 4.8 GiB RAM,
BlockManagerId(1, ip-10-224-51-220.ap-south-1.compute.internal, 38397, None)
23/01/12 05:31:33 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/.hoodie/hoodie.properties' for reading
23/01/12 05:31:34 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/.hoodie/20230109092536279.deltacommit' for
reading
23/01/12 05:31:34 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/.hoodie/20230109092536279.deltacommit' for
reading
23/01/12 05:31:34 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/campaign_id=350/.fe1ae9e1-f3b1-463d-920d-c58d12231cec-0_20230109092436558.log.1_0-24-28'
for reading
23/01/12 05:31:35 INFO CodeGenerator: Code generated in 226.844628 ms
23/01/12 05:31:35 INFO CodeGenerator: Code generated in 8.402992 ms
23/01/12 05:31:35 INFO CodeGenerator: Code generated in 16.767879 ms
23/01/12 05:31:35 INFO SparkContext: Starting job: showString at
NativeMethodAc
23/01/12 05:31:35 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 7.8 KiB, free 912.3 MiB)
23/01/12 05:31:35 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes
in memory (estimated size 3.9 KiB, free 912.3 MiB)
23/01/12 05:31:35 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 3.9 KiB, free:
912.3 MiB)
23/01/12 05:31:35 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:1570
23/01/12 05:31:35 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 0 (MapPartitionsRDD[2] at showString at
NativeMethodAccessorImpl.java:0) (first 15 tasks are
23/01/12 05:31:37 INFO DAGScheduler: ResultStage 0 (showString at
NativeMethodAccessorImpl.java:0) finished in 2.106 s
23/01/12 05:31:37 INFO DAGScheduler: Job 0 is finished. Cancelling potential
speculative or zombie tasks for this job
23/01/12 05:31:37 INFO YarnScheduler: Killing all running tasks in stage 0:
Stage finished
23/01/12 05:31:37 INFO DAGScheduler: Job 0 finished: showString at
NativeMethodAccessorImpl.java:0, took 2.171787 s
23/01/12 05:31:37 INFO CodeGenerator: Code generated in 11.952856 ms
########################
# Default Database details #
########################
+---------+--------------+-----------+
|namespace| tableName|isTemporary|
+---------+--------------+-----------+
| |clustering_mor| false|
+---------+--------------+-----------+
2 None
=========================
23/01/12 05:31:38 INFO CodeGenerator: Code generated in 29.306587 ms
23/01/12 05:31:38 INFO CodeGenerator: Code generated in 11.81622 ms
23/01/12 05:31:38 INFO SparkContext: Starting job: hasNext at
NativeMethodAccessorImpl.java:0
23/01/12 05:31:38 INFO DAGScheduler: Got job 1 (hasNext at
NativeMethodAccessorImpl.java:0) with 1 output partitions
23/01/12 05:31:38 INFO DAGScheduler: Final stage: ResultStage 1 (hasNext at
NativeMethodAccessorImpl.java:0)
23/01/12 05:31:38 INFO DAGScheduler: Parents of final stage: List()
23/01/12 05:31:38 INFO DAGScheduler: Missing parents: List()
23/01/12 05:31:38 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[6] at toLocalIterator at NativeMethodAccessorImpl.java:0),
which has no missing parents
23/01/12 05:31:38 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 5.5 KiB, free 912.3 MiB)
23/01/12 05:31:38 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 2.9 KiB, free 912.3 MiB)
23/01/12 05:31:38 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 2.9 KiB, free:
912.3 MiB)
23/01/12 05:31:38 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:1570
23/01/12 05:31:38 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[6] at toLocalIterator at
NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
23/01/12 05:31:38 INFO YarnScheduler: Adding task set 1.0 with 1 tasks
resource profile 0
23/01/12 05:31:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1) (ip-10-224-50-165.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 4519 bytes) taskResourceAssignments Map()
23/01/12 05:31:38 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on ip-10-224-50-165.ap-south-1.compute.internal:38209 (size: 2.9 KiB, free: 4.8
GiB)
23/01/12 05:31:38 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
1) in 39 ms on ip-10-224-50-165.ap-south-1.compute.internal (executor 2) (1/1)
23/01/12 05:31:38 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have
all completed, from pool
23/01/12 05:31:38 INFO DAGScheduler: ResultStage 1 (hasNext at
NativeMethodAccessorImpl.java:0) finished in 0.052 s
23/01/12 05:31:38 INFO DAGScheduler: Job 1 is finished. Cancelling potential
speculative or zombie tasks for this job
23/01/12 05:31:38 INFO YarnScheduler: Killing all running tasks in stage 1:
Stage finished
23/01/12 05:31:38 INFO DAGScheduler: Job 1 finished: hasNext at
NativeMethodAccessorImpl.java:0, took 0.060118 s
23/01/12 05:31:38 INFO CodeGenerator: Code generated in 16.381346 ms
==================================================================================================
[Table(name='clustering_mor', database=None, description=None,
tableType='TEMPORARY', isTemporary=True)]
==================================================================================================
23/01/12 05:31:38 INFO SparkContext: Starting job: collect at
HoodieSparkEngineContext.java:103
23/01/12 05:31:38 INFO DAGScheduler: Got job 2 (collect at
HoodieSparkEngineContext.java:103) with 1 output partitions
23/01/12 05:31:38 INFO DAGScheduler: Final stage: ResultStage 2 (collect at
HoodieSparkEngineContext.java:103)
23/01/12 05:31:38 INFO DAGScheduler: Parents of final stage: List()
23/01/12 05:31:38 INFO DAGScheduler: Missing parents: List()
23/01/12 05:31:38 INFO DAGScheduler: Submitting ResultStage 2
(MapPartitionsRDD[8] at map at HoodieSparkEngineContext.java:103), which has no
missing parents
23/01/12 05:31:38 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 96.9 KiB, free 912.2 MiB)
23/01/12 05:31:38 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes
in memory (estimated size 35.8 KiB, free 912.2 MiB)
23/01/12 05:31:38 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 35.8 KiB, free:
912.3 MiB)
23/01/12 05:31:38 INFO SparkContext: Created broadcast 2 from broadcast at
DAGScheduler.scala:1570
23/01/12 05:31:38 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 2 (MapPartitionsRDD[8] at map at HoodieSparkEngineContext.java:103)
(first 15 tasks are for partitions Vector(0))
23/01/12 05:31:38 INFO YarnScheduler: Adding task set 2.0 with 1 tasks
resource profile 0
23/01/12 05:31:38 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
2) (ip-10-224-51-220.ap-south-1.compute.internal, executor 1, partition 0,
PROCESS_LOCAL, 4405 bytes) taskResourceAssignments Map()
23/01/12 05:31:39 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on ip-10-224-51-220.ap-south-1.compute.internal:38397 (size: 35.8 KiB, free:
4.8 GiB)
23/01/12 05:31:40 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID
2) in 1904 ms on ip-10-224-51-220.ap-south-1.compute.internal (executor 1) (1/1)
23/01/12 05:31:40 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have
all completed, from pool
23/01/12 05:31:40 INFO DAGScheduler: ResultStage 2 (collect at
HoodieSparkEngineContext.java:103) finished in 1.922 s
23/01/12 05:31:40 INFO DAGScheduler: Job 2 is finished. Cancelling potential
speculative or zombie tasks for this job
23/01/12 05:31:40 INFO YarnScheduler: Killing all running tasks in stage 2:
Stage finished
23/01/12 05:31:40 INFO DAGScheduler: Job 2 finished: collect at
HoodieSparkEngineContext.java:103, took 1.927174 s
23/01/12 05:31:40 INFO SparkContext: Starting job: collect at
HoodieSparkEngineContext.java:103
23/01/12 05:31:40 INFO DAGScheduler: Got job 3 (collect at
HoodieSparkEngineContext.java:103) with 1 output partitions
23/01/12 05:31:40 INFO DAGScheduler: Final stage: ResultStage 3 (collect at
HoodieSparkEngineContext.java:103)
23/01/12 05:31:40 INFO DAGScheduler: Parents of final stage: List()
23/01/12 05:31:40 INFO DAGScheduler: Missing parents: List()
23/01/12 05:31:40 INFO DAGScheduler: Submitting ResultStage 3
(MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:103), which has
no missing parents
23/01/12 05:31:40 INFO MemoryStore: Block broadcast_3 stored as values in
memory (estimated size 96.9 KiB, free 912.1 MiB)
23/01/12 05:31:40 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes
in memory (estimated size 35.8 KiB, free 912.0 MiB)
23/01/12 05:31:40 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 35.8 KiB, free:
912.2 MiB)
23/01/12 05:31:40 INFO SparkContext: Created broadcast 3 from broadcast at
DAGScheduler.scala:1570
23/01/12 05:31:40 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 3 (MapPartitionsRDD[10] at map at
HoodieSparkEngineContext.java:103) (first 15 tasks are for partitions Vector(0))
23/01/12 05:31:40 INFO YarnScheduler: Adding task set 3.0 with 1 tasks
resource profile 0
23/01/12 05:31:40 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
3) (ip-10-224-50-165.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 4421 bytes) taskResourceAssignments Map()
23/01/12 05:31:40 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on ip-10-224-50-165.ap-south-1.compute.internal:38209 (size: 35.8 KiB, free:
4.8 GiB)
23/01/12 05:31:41 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
3) in 952 ms on ip-10-224-50-165.ap-south-1.compute.internal (executor 2) (1/1)
23/01/12 05:31:41 INFO DAGScheduler: ResultStage 3 (collect at
HoodieSparkEngineContext.java:103) finished in 0.970 s
23/01/12 05:31:41 INFO DAGScheduler: Job 3 is finished. Cancelling potential
speculative or zombie tasks for this job
23/01/12 05:31:41 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have
all completed, from pool
23/01/12 05:31:41 INFO YarnScheduler: Killing all running tasks in stage 3:
Stage finished
23/01/12 05:31:41 INFO DAGScheduler: Job 3 finished: collect at
HoodieSparkEngineContext.java:103, took 0.977347 s
23/01/12 05:31:41 INFO SparkContext: Starting job: collect at
HoodieSparkEngineContext.java:103
23/01/12 05:31:41 INFO DAGScheduler: Got job 4 (collect at
HoodieSparkEngineContext.java:103) with 1 output partitions
23/01/12 05:31:41 INFO DAGScheduler: Final stage: ResultStage 4 (collect at
HoodieSparkEngineContext.java:103)
23/01/12 05:31:41 INFO DAGScheduler: Parents of final stage: List()
23/01/12 05:31:41 INFO DAGScheduler: Missing parents: List()
23/01/12 05:31:41 INFO DAGScheduler: Submitting ResultStage 4
(MapPartitionsRDD[12] at map at HoodieSparkEngineContext.java:103), which has
no missing parents
23/01/12 05:31:41 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 96.9 KiB, free 911.9 MiB)
23/01/12 05:31:41 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes
in memory (estimated size 35.8 KiB, free 911.9 MiB)
23/01/12 05:31:41 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 35.8 KiB, free:
912.2 MiB)
23/01/12 05:31:41 INFO SparkContext: Created broadcast 4 from broadcast at
DAGScheduler.scala:1570
23/01/12 05:31:41 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 4 (MapPartitionsRDD[12] at map at
HoodieSparkEngineContext.java:103) (first 15 tasks are for partitions Vector(0))
23/01/12 05:31:41 INFO YarnScheduler: Adding task set 4.0 with 1 tasks
resource profile 0
23/01/12 05:31:41 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID
4) (ip-10-224-50-165.ap-south-1.compute.internal, executor 2, partition 0,
PROCESS_LOCAL, 4394 bytes) taskResourceAssignments Map()
23/01/12 05:31:41 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on ip-10-224-50-165.ap-south-1.compute.internal:38209 (size: 35.8 KiB, free:
4.8 GiB)
23/01/12 05:31:41 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID
4) in 101 ms on ip-10-224-50-165.ap-south-1.compute.internal (executor 2) (1/1)
23/01/12 05:31:41 INFO YarnScheduler: Removed TaskSet 4.0, whose tasks have
all completed, from pool
23/01/12 05:31:41 INFO DAGScheduler: ResultStage 4 (collect at
HoodieSparkEngineContext.java:103) finished in 0.118 s
23/01/12 05:31:41 INFO DAGScheduler: Job 4 is finished. Cancelling potential
speculative or zombie tasks for this job
23/01/12 05:31:41 INFO YarnScheduler: Killing all running tasks in stage 4:
Stage finished
23/01/12 05:31:41 INFO DAGScheduler: Job 4 finished: collect at
HoodieSparkEngineContext.java:103, took 0.124767 s
23/01/12 05:31:41 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/.hoodie/20230109084319407.replacecommit'
for reading
23/01/12 05:31:41 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/.hoodie/20230109090044937.replacecommit'
for reading
23/01/12 05:31:41 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/clustering_mor/.hoodie/20230109092436558.replacecommit'
for reading
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_5 stored as values in
memory (estimated size 347.4 KiB, free 911.6 MiB)
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes
in memory (estimated size 33.0 KiB, free 911.5 MiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 33.0 KiB, free:
912.2 MiB)
23/01/12 05:31:42 INFO SparkContext: Created broadcast 5 from broadcast at
HoodieBaseRelation.scala:539
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 355.1 KiB, free 911.2 MiB)
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes
in memory (estimated size 33.9 KiB, free 911.1 MiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 33.9 KiB, free:
912.1 MiB)
23/01/12 05:31:42 INFO SparkContext: Created broadcast 6 from
buildReaderWithPartitionValues at HoodieDataSourceHelper.scala:61
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 347.4 KiB, free 910.8 MiB)
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes
in memory (estimated size 33.0 KiB, free 910.8 MiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 33.0 KiB, free:
912.1 MiB)
23/01/12 05:31:42 INFO SparkContext: Created broadcast 7 from broadcast at
HoodieBaseRelation.scala:539
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_8 stored as values in
memory (estimated size 355.1 KiB, free 910.4 MiB)
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes
in memory (estimated size 33.9 KiB, free 910.4 MiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 33.9 KiB, free:
912.1 MiB)
23/01/12 05:31:42 INFO SparkContext: Created broadcast 8 from
buildReaderWithPartitionValues at HoodieDataSourceHelper.scala:61
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_9 stored as values in
memory (estimated size 347.5 KiB, free 910.0 MiB)
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes
in memory (estimated size 33.0 KiB, free 910.0 MiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 33.0 KiB, free:
912.0 MiB)
23/01/12 05:31:42 INFO SparkContext: Created broadcast 9 from broadcast at
HoodieMergeOnReadRDD.scala:71
23/01/12 05:31:42 INFO CodeGenerator: Code generated in 17.905044 ms
23/01/12 05:31:42 INFO SparkContext: Starting job: showString at
NativeMethodAccessorImpl.java:0
23/01/12 05:31:42 INFO DAGScheduler: Got job 5 (showString at
NativeMethodAccessorImpl.java:0) with 1 output partitions
23/01/12 05:31:42 INFO DAGScheduler: Final stage: ResultStage 5 (showString
at NativeMethodAccessorImpl.java:0)
23/01/12 05:31:42 INFO DAGScheduler: Parents of final stage: List()
23/01/12 05:31:42 INFO DAGScheduler: Missing parents: List()
23/01/12 05:31:42 INFO DAGScheduler: Submitting ResultStage 5
(MapPartitionsRDD[15] at showString at NativeMethodAccessorImpl.java:0), which
has no missing parents
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_10 stored as values in
memory (estimated size 20.7 KiB, free 910.0 MiB)
23/01/12 05:31:42 INFO MemoryStore: Block broadcast_10_piece0 stored as
bytes in memory (estimated size 8.4 KiB, free 910.0 MiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory
on ip-10-224-51-200.ap-south-1.compute.internal:44241 (size: 8.4 KiB, free:
912.0 MiB)
23/01/12 05:31:42 INFO SparkContext: Created broadcast 10 from broadcast at
DAGScheduler.scala:1570
23/01/12 05:31:42 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 5 (MapPartitionsRDD[15] at showString at
NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
23/01/12 05:31:42 INFO YarnScheduler: Adding task set 5.0 with 1 tasks
resource profile 0
23/01/12 05:31:42 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID
5) (ip-10-224-51-220.ap-south-1.compute.internal, executor 1, partition 0,
PROCESS_LOCAL, 5479 bytes) taskResourceAssignments Map()
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory
on ip-10-224-51-220.ap-south-1.compute.internal:38397 (size: 8.4 KiB, free: 4.8
GiB)
23/01/12 05:31:42 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on ip-10-224-51-220.ap-south-1.compute.internal:38397 (size: 33.9 KiB, free:
4.8 GiB)
23/01/12 05:31:43 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory
on ip-10-224-51-220.ap-south-1.compute.internal:38397 (size: 33.0 KiB, free:
4.8 GiB)
23/01/12 05:31:44 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID
5) in 2598 ms on ip-10-224-51-220.ap-south-1.compute.internal (executor 1) (1/1)
23/01/12 05:31:44 INFO YarnScheduler: Removed TaskSet 5.0, whose tasks have
all completed, from pool
23/01/12 05:31:44 INFO DAGScheduler: ResultStage 5 (showString at
NativeMethodAccessorImpl.java:0) finished in 2.659 s
23/01/12 05:31:44 INFO DAGScheduler: Job 5 is finished. Cancelling potential
speculative or zombie tasks for this job
23/01/12 05:31:44 INFO YarnScheduler: Killing all running tasks in stage 5:
Stage finished
23/01/12 05:31:44 INFO DAGScheduler: Job 5 finished: showString at
NativeMethodAccessorImpl.java:0, took 2.666745 s
23/01/12 05:31:44 INFO CodeGenerator: Code generated in 22.199602 ms
+-------------------+--------------------+--------------------+----------------------+--------------------+-----------+----------+--------------------+--------------------+-----------+--------+--------------------+-------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|
_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name|campaign_id|
client_id| created_by| created_date|event_count|event_id|
updated_by| updated_date|
+-------------------+--------------------+--------------------+----------------------+--------------------+-----------+----------+--------------------+--------------------+-----------+--------+--------------------+-------------------+
| 20230109092536279|20230109092536279...|campaign_id:350,e...|
campaign_id=350|fe1ae9e1-f3b1-463...|
350|cl-WJxiIuA|Campaign_Event_Su...|2022-09-12T13:54:...| 79|
2|Campaign_Event_Su...|2023-01-09T09:25:24|
+-------------------+--------------------+--------------------+----------------------+--------------------+-----------+----------+--------------------+--------------------+-----------+--------+--------------------+-------------------+
None
Traceback (most recent call last):
==================================================================================================
File "/home/hadoop/mg/spark_savepoint.py", line 49, in <module>
print(spark.sql("""call show_savepoints(table
=>'clustering_mor')""").show())
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line
1034, in sql
File
"/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line
1322, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line
196, in deco
pyspark.sql.utils.AnalysisException: Table or view 'clustering_mor' not
found in database 'default'
==================================================================================================
23/01/12 05:31:45 INFO SparkContext: Invoking stop() from shutdown hook
23/01/12 05:31:45 INFO SparkUI: Stopped Spark web UI at
http://ip-10-224-51-200.ap-south-1.compute.internal:4040
23/01/12 05:31:45 INFO YarnClientSchedulerBackend: Interrupting monitor
thread
23/01/12 05:31:45 INFO YarnClientSchedulerBackend: Shutting down all
executors
23/01/12 05:31:45 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each
executor to shut down
23/01/12 05:31:45 INFO YarnClientSchedulerBackend: YARN client scheduler
backend Stopped
23/01/12 05:31:45 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
23/01/12 05:31:45 INFO MemoryStore: MemoryStore cleared
23/01/12 05:31:45 INFO BlockManager: BlockManager stopped
23/01/12 05:31:45 INFO BlockManagerMaster: BlockManagerMaster stopped
23/01/12 05:31:45 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
23/01/12 05:31:45 INFO SparkContext: Successfully stopped SparkContext
23/01/12 05:31:45 INFO ShutdownHookManager: Shutdown hook called
23/01/12 05:31:45 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-76ce116e-57b6-4ab0-9b19-40220d2d67c3/pyspark-eb511084-1773-46e9-ac13-379513b577b7
23/01/12 05:31:45 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-017294b5-55c6-47d6-a065-c8d3066d7ddb
23/01/12 05:31:45 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-76ce116e-57b6-4ab0-9b19-40220d2d67c3
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]