rkkalluri commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1073440851
I am able to reproduce this locally on 0.11.0-SNAPSHOT
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[==>20220320215909174__commit__INFLIGHT]}
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with
storage type :REMOTE_FIRST
22/03/20 21:59:22 INFO FileSystemViewManager: Creating remote first table
view
22/03/20 21:59:22 INFO TransactionUtils: Successfully resolved conflicts, if
any
22/03/20 21:59:22 INFO BaseHoodieWriteClient: Committing 20220320215909174
action commit
22/03/20 21:59:22 INFO SparkContext: Starting job: collect at
HoodieSparkEngineContext.java:134
22/03/20 21:59:22 INFO DAGScheduler: Got job 680 (collect at
HoodieSparkEngineContext.java:134) with 1 output partitions
22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 984 (collect
at HoodieSparkEngineContext.java:134)
22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List()
22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 984
(MapPartitionsRDD[2117] at flatMap at HoodieSparkEngineContext.java:134), which
has no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_848 stored as values in
memory (estimated size 99.5 KiB, free 357.5 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_848_piece0 stored as
bytes in memory (estimated size 35.1 KiB, free 357.5 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_848_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 35.1 KiB, free: 364.0 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 848 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 984 (MapPartitionsRDD[2117] at flatMap at
HoodieSparkEngineContext.java:134) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 984.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 984.0 (TID
2266) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4387
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 984.0 (TID 2266)
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 984.0 (TID
2266). 888 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 984.0 (TID
2266) in 22 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 984.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ResultStage 984 (collect at
HoodieSparkEngineContext.java:134) finished in 0.041 s
22/03/20 21:59:22 INFO DAGScheduler: Job 680 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage
984: Stage finished
22/03/20 21:59:22 INFO DAGScheduler: Job 680 finished: collect at
HoodieSparkEngineContext.java:134, took 0.042314 s
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO HoodieTableMetadataUtil: Loading latest file slices
for metadata table partition files
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:22 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=0
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[==>20220320215909174__commit__INFLIGHT]}
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableMetadataUtil: Updating at
20220320215909174 from Commit/BULK_INSERT. #partitions_updated=2
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO HoodieTableMetadataUtil: Loading latest file slices
for metadata table partition files
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:22 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
22/03/20 21:59:22 INFO BaseHoodieClient: Embedded Timeline Server is
disabled. Not starting timeline service
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table
View
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[==>20220320215909174__commit__INFLIGHT]}
22/03/20 21:59:22 INFO BaseHoodieWriteClient: Scheduling table service
COMPACT
22/03/20 21:59:22 INFO BaseHoodieWriteClient: Scheduling compaction at
instant time :20220320215908164001
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table
View
22/03/20 21:59:22 INFO ScheduleCompactionActionExecutor: Checking if
compaction needs to be run on file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215908164__deltacommit__COMPLETED]}
22/03/20 21:59:22 INFO BaseHoodieWriteClient: Generate a new instant time:
20220320215909174 action: deltacommit
22/03/20 21:59:22 INFO HoodieHeartbeatClient: Received request to start
heartbeat for instant time 20220320215909174
22/03/20 21:59:22 INFO HoodieActiveTimeline: Creating a new instant
[==>20220320215909174__deltacommit__REQUESTED]
22/03/20 21:59:22 INFO BlockManagerInfo: Removed broadcast_848_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 35.1 KiB, free: 364.0 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Removed broadcast_847_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 196.6 KiB, free: 364.2 MiB)
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:22 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[==>20220320215909174__deltacommit__REQUESTED]}
22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table
View
22/03/20 21:59:22 INFO FileSystemViewManager: Creating InMemory based view
for basePath file:/tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[==>20220320215909174__deltacommit__REQUESTED]}
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO AsyncCleanerService: The HoodieWriteClient is not
configured to auto & async clean. Async clean service will not start.
22/03/20 21:59:22 INFO AsyncArchiveService: The HoodieWriteClient is not
configured to auto & async archive. Async archive service will not start.
22/03/20 21:59:22 INFO SparkContext: Starting job: countByKey at
BaseSparkCommitActionExecutor.java:190
22/03/20 21:59:22 INFO DAGScheduler: Registering RDD 2123 (countByKey at
BaseSparkCommitActionExecutor.java:190) as input to shuffle 182
22/03/20 21:59:22 INFO DAGScheduler: Got job 681 (countByKey at
BaseSparkCommitActionExecutor.java:190) with 1 output partitions
22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 986
(countByKey at BaseSparkCommitActionExecutor.java:190)
22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 985)
22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
985)
22/03/20 21:59:22 INFO DAGScheduler: Submitting ShuffleMapStage 985
(MapPartitionsRDD[2123] at countByKey at
BaseSparkCommitActionExecutor.java:190), which has no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_849 stored as values in
memory (estimated size 9.5 KiB, free 358.4 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_849_piece0 stored as
bytes in memory (estimated size 5.2 KiB, free 358.3 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_849_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 5.2 KiB, free: 364.2 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 849 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ShuffleMapStage 985 (MapPartitionsRDD[2123] at countByKey at
BaseSparkCommitActionExecutor.java:190) (first 15 tasks are for partitions
Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 985.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 985.0 (TID
2267) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4800
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 985.0 (TID 2267)
22/03/20 21:59:22 INFO MemoryStore: Block rdd_2121_0 stored as values in
memory (estimated size 367.0 B, free 358.3 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added rdd_2121_0 in memory on
rkalluri.attlocal.net:63252 (size: 367.0 B, free: 364.2 MiB)
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 985.0 (TID
2267). 1052 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 985.0 (TID
2267) in 5 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 985.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ShuffleMapStage 985 (countByKey at
BaseSparkCommitActionExecutor.java:190) finished in 0.008 s
22/03/20 21:59:22 INFO DAGScheduler: looking for newly runnable stages
22/03/20 21:59:22 INFO DAGScheduler: running: Set()
22/03/20 21:59:22 INFO DAGScheduler: waiting: Set(ResultStage 986)
22/03/20 21:59:22 INFO DAGScheduler: failed: Set()
22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 986
(ShuffledRDD[2124] at countByKey at BaseSparkCommitActionExecutor.java:190),
which has no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_850 stored as values in
memory (estimated size 5.5 KiB, free 358.3 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_850_piece0 stored as
bytes in memory (estimated size 3.1 KiB, free 358.3 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_850_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 3.1 KiB, free: 364.2 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 850 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 986 (ShuffledRDD[2124] at countByKey at
BaseSparkCommitActionExecutor.java:190) (first 15 tasks are for partitions
Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 986.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 986.0 (TID
2268) (rkalluri.attlocal.net, executor driver, partition 0, NODE_LOCAL, 4271
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 986.0 (TID 2268)
22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Getting 1 (156.0 B)
non-empty blocks including 1 (156.0 B) local and 0 (0.0 B) host-local and 0
(0.0 B) push-merged-local and 0 (0.0 B) remote blocks
22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches
in 0 ms
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 986.0 (TID
2268). 1318 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 986.0 (TID
2268) in 4 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 986.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ResultStage 986 (countByKey at
BaseSparkCommitActionExecutor.java:190) finished in 0.005 s
22/03/20 21:59:22 INFO DAGScheduler: Job 681 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage
986: Stage finished
22/03/20 21:59:22 INFO DAGScheduler: Job 681 finished: countByKey at
BaseSparkCommitActionExecutor.java:190, took 0.014181 s
22/03/20 21:59:22 INFO BaseSparkCommitActionExecutor: Input workload profile
:WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=2},
InputPartitionStat={files=WorkloadStat {numInserts=0, numUpdates=2}},
OutputPartitionStat={}, operationType=UPSERT_PREPPED}
22/03/20 21:59:22 INFO UpsertPartitioner: AvgRecordSize => 1024
22/03/20 21:59:22 INFO SparkContext: Starting job: collectAsMap at
UpsertPartitioner.java:272
22/03/20 21:59:22 INFO DAGScheduler: Got job 682 (collectAsMap at
UpsertPartitioner.java:272) with 1 output partitions
22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 987
(collectAsMap at UpsertPartitioner.java:272)
22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage: List()
22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 987
(MapPartitionsRDD[2126] at mapToPair at UpsertPartitioner.java:271), which has
no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_851 stored as values in
memory (estimated size 328.6 KiB, free 358.0 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_851_piece0 stored as
bytes in memory (estimated size 116.9 KiB, free 357.9 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_851_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 116.9 KiB, free: 364.1 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 851 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 987 (MapPartitionsRDD[2126] at mapToPair at
UpsertPartitioner.java:271) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 987.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 987.0 (TID
2269) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4337
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 987.0 (TID 2269)
22/03/20 21:59:22 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
22/03/20 21:59:22 INFO FileSystemViewManager: Creating in-memory based Table
View
22/03/20 21:59:22 INFO FileSystemViewManager: Creating InMemory based view
for basePath file:/tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:22 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 987.0 (TID
2269). 829 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 987.0 (TID
2269) in 19 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 987.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ResultStage 987 (collectAsMap at
UpsertPartitioner.java:272) finished in 0.074 s
22/03/20 21:59:22 INFO DAGScheduler: Job 682 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage
987: Stage finished
22/03/20 21:59:22 INFO DAGScheduler: Job 682 finished: collectAsMap at
UpsertPartitioner.java:272, took 0.074789 s
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO UpsertPartitioner: Total Buckets :1, buckets info =>
{0=BucketInfo {bucketType=UPDATE, fileIdPrefix=files-0000,
partitionPath=files}},
Partition to insert buckets => {},
UpdateLocations mapped to buckets =>{files-0000=0}
22/03/20 21:59:22 INFO HoodieActiveTimeline: Checking for file exists
?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.requested
22/03/20 21:59:22 INFO FileIOUtils: Created a new file in meta path:
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.inflight
22/03/20 21:59:22 INFO HoodieActiveTimeline: Create new file for toInstant
?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.inflight
22/03/20 21:59:22 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:22 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:22 INFO SparkContext: Starting job: collect at
BaseSparkUpdateStrategy.java:51
22/03/20 21:59:22 INFO DAGScheduler: Registering RDD 2129 (distinct at
BaseSparkUpdateStrategy.java:51) as input to shuffle 183
22/03/20 21:59:22 INFO DAGScheduler: Got job 683 (collect at
BaseSparkUpdateStrategy.java:51) with 1 output partitions
22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 989 (collect
at BaseSparkUpdateStrategy.java:51)
22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 988)
22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
988)
22/03/20 21:59:22 INFO DAGScheduler: Submitting ShuffleMapStage 988
(MapPartitionsRDD[2129] at distinct at BaseSparkUpdateStrategy.java:51), which
has no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_852 stored as values in
memory (estimated size 9.5 KiB, free 357.9 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_852_piece0 stored as
bytes in memory (estimated size 5.1 KiB, free 357.9 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_852_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 5.1 KiB, free: 364.1 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 852 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ShuffleMapStage 988 (MapPartitionsRDD[2129] at distinct at
BaseSparkUpdateStrategy.java:51) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 988.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 988.0 (TID
2270) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4800
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 988.0 (TID 2270)
22/03/20 21:59:22 INFO BlockManager: Found block rdd_2121_0 locally
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 988.0 (TID
2270). 1138 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 988.0 (TID
2270) in 5 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 988.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ShuffleMapStage 988 (distinct at
BaseSparkUpdateStrategy.java:51) finished in 0.007 s
22/03/20 21:59:22 INFO DAGScheduler: looking for newly runnable stages
22/03/20 21:59:22 INFO DAGScheduler: running: Set()
22/03/20 21:59:22 INFO DAGScheduler: waiting: Set(ResultStage 989)
22/03/20 21:59:22 INFO DAGScheduler: failed: Set()
22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 989
(MapPartitionsRDD[2131] at distinct at BaseSparkUpdateStrategy.java:51), which
has no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_853 stored as values in
memory (estimated size 6.3 KiB, free 357.9 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_853_piece0 stored as
bytes in memory (estimated size 3.4 KiB, free 357.9 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_853_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 3.4 KiB, free: 364.1 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 853 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 989 (MapPartitionsRDD[2131] at distinct at
BaseSparkUpdateStrategy.java:51) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 989.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 989.0 (TID
2271) (rkalluri.attlocal.net, executor driver, partition 0, NODE_LOCAL, 4271
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 989.0 (TID 2271)
22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Getting 1 (117.0 B)
non-empty blocks including 1 (117.0 B) local and 0 (0.0 B) host-local and 0
(0.0 B) push-merged-local and 0 (0.0 B) remote blocks
22/03/20 21:59:22 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches
in 0 ms
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 989.0 (TID
2271). 1249 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 989.0 (TID
2271) in 4 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 989.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ResultStage 989 (collect at
BaseSparkUpdateStrategy.java:51) finished in 0.005 s
22/03/20 21:59:22 INFO DAGScheduler: Job 683 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:22 INFO TaskSchedulerImpl: Killing all running tasks in stage
989: Stage finished
22/03/20 21:59:22 INFO DAGScheduler: Job 683 finished: collect at
BaseSparkUpdateStrategy.java:51, took 0.012885 s
22/03/20 21:59:22 INFO BaseSparkCommitActionExecutor: no validators
configured.
22/03/20 21:59:22 INFO BaseCommitActionExecutor: Auto commit enabled:
Committing 20220320215909174
22/03/20 21:59:22 INFO SparkContext: Starting job: collect at
BaseSparkCommitActionExecutor.java:275
22/03/20 21:59:22 INFO DAGScheduler: Registering RDD 2132 (mapToPair at
BaseSparkCommitActionExecutor.java:227) as input to shuffle 184
22/03/20 21:59:22 INFO DAGScheduler: Got job 684 (collect at
BaseSparkCommitActionExecutor.java:275) with 1 output partitions
22/03/20 21:59:22 INFO DAGScheduler: Final stage: ResultStage 991 (collect
at BaseSparkCommitActionExecutor.java:275)
22/03/20 21:59:22 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 990)
22/03/20 21:59:22 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
990)
22/03/20 21:59:22 INFO DAGScheduler: Submitting ShuffleMapStage 990
(MapPartitionsRDD[2132] at mapToPair at
BaseSparkCommitActionExecutor.java:227), which has no missing parents
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_854 stored as values in
memory (estimated size 332.8 KiB, free 357.6 MiB)
22/03/20 21:59:22 INFO MemoryStore: Block broadcast_854_piece0 stored as
bytes in memory (estimated size 119.4 KiB, free 357.4 MiB)
22/03/20 21:59:22 INFO BlockManagerInfo: Added broadcast_854_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 119.4 KiB, free: 364.0 MiB)
22/03/20 21:59:22 INFO SparkContext: Created broadcast 854 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:22 INFO DAGScheduler: Submitting 1 missing tasks from
ShuffleMapStage 990 (MapPartitionsRDD[2132] at mapToPair at
BaseSparkCommitActionExecutor.java:227) (first 15 tasks are for partitions
Vector(0))
22/03/20 21:59:22 INFO TaskSchedulerImpl: Adding task set 990.0 with 1 tasks
resource profile 0
22/03/20 21:59:22 INFO TaskSetManager: Starting task 0.0 in stage 990.0 (TID
2272) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4800
bytes) taskResourceAssignments Map()
22/03/20 21:59:22 INFO Executor: Running task 0.0 in stage 990.0 (TID 2272)
22/03/20 21:59:22 INFO BlockManager: Found block rdd_2121_0 locally
22/03/20 21:59:22 INFO Executor: Finished task 0.0 in stage 990.0 (TID
2272). 1052 bytes result sent to driver
22/03/20 21:59:22 INFO TaskSetManager: Finished task 0.0 in stage 990.0 (TID
2272) in 19 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:22 INFO TaskSchedulerImpl: Removed TaskSet 990.0, whose tasks
have all completed, from pool
22/03/20 21:59:22 INFO DAGScheduler: ShuffleMapStage 990 (mapToPair at
BaseSparkCommitActionExecutor.java:227) finished in 0.073 s
22/03/20 21:59:22 INFO DAGScheduler: looking for newly runnable stages
22/03/20 21:59:22 INFO DAGScheduler: running: Set()
22/03/20 21:59:22 INFO DAGScheduler: waiting: Set(ResultStage 991)
22/03/20 21:59:22 INFO DAGScheduler: failed: Set()
22/03/20 21:59:22 INFO DAGScheduler: Submitting ResultStage 991
(MapPartitionsRDD[2137] at map at BaseSparkCommitActionExecutor.java:275),
which has no missing parents
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_855 stored as values in
memory (estimated size 435.7 KiB, free 357.0 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_855_piece0 stored as
bytes in memory (estimated size 156.4 KiB, free 356.9 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_855_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 156.4 KiB, free: 363.8 MiB)
22/03/20 21:59:23 INFO SparkContext: Created broadcast 855 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 991 (MapPartitionsRDD[2137] at map at
BaseSparkCommitActionExecutor.java:275) (first 15 tasks are for partitions
Vector(0))
22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 991.0 with 1 tasks
resource profile 0
22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 991.0 (TID
2273) (rkalluri.attlocal.net, executor driver, partition 0, NODE_LOCAL, 4271
bytes) taskResourceAssignments Map()
22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 991.0 (TID 2273)
22/03/20 21:59:23 INFO ShuffleBlockFetcherIterator: Getting 1 (445.0 B)
non-empty blocks including 1 (445.0 B) local and 0 (0.0 B) host-local and 0
(0.0 B) push-merged-local and 0 (0.0 B) remote blocks
22/03/20 21:59:23 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches
in 0 ms
22/03/20 21:59:23 INFO BaseSparkDeltaCommitActionExecutor: Merging updates
for commit 20220320215909174 for file files-0000
22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
22/03/20 21:59:23 INFO FileSystemViewManager: Creating in-memory based Table
View
22/03/20 21:59:23 INFO FileSystemViewManager: Creating InMemory based view
for basePath file:/tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:23 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=14, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=0
22/03/20 21:59:23 INFO DirectWriteMarkers: Creating Marker
Path=file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/.temp/20220320215909174/files/files-0000_0-991-2273_20220320215907162001.hfile.marker.APPEND
22/03/20 21:59:23 INFO DirectWriteMarkers: [direct] Created marker file
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/.temp/20220320215909174/files/files-0000_0-991-2273_20220320215907162001.hfile.marker.APPEND
in 23 ms
22/03/20 21:59:23 INFO HoodieLogFormat$WriterBuilder: Building
HoodieLogFormat Writer
22/03/20 21:59:23 INFO HoodieLogFormat$WriterBuilder: HoodieLogFile on path
file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255
22/03/20 21:59:23 INFO HoodieLogFormatWriter: Append not supported.. Rolling
over to
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273',
fileLen=-1}
22/03/20 21:59:23 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:23 INFO CodecPool: Got brand-new compressor [.gz]
22/03/20 21:59:23 INFO CodecPool: Got brand-new compressor [.gz]
22/03/20 21:59:23 INFO HoodieAppendHandle: AppendHandle for partitionPath
files filePath files/.files-0000_20220320215907162001.log.2_0-991-2273, took 47
ms.
22/03/20 21:59:23 INFO MemoryStore: Block rdd_2136_0 stored as values in
memory (estimated size 339.0 B, free 356.9 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added rdd_2136_0 in memory on
rkalluri.attlocal.net:63252 (size: 339.0 B, free: 363.8 MiB)
22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 991.0 (TID
2273). 1635 bytes result sent to driver
22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 991.0 (TID
2273) in 72 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 991.0, whose tasks
have all completed, from pool
22/03/20 21:59:23 INFO DAGScheduler: ResultStage 991 (collect at
BaseSparkCommitActionExecutor.java:275) finished in 0.145 s
22/03/20 21:59:23 INFO DAGScheduler: Job 684 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage
991: Stage finished
22/03/20 21:59:23 INFO DAGScheduler: Job 684 finished: collect at
BaseSparkCommitActionExecutor.java:275, took 0.219633 s
22/03/20 21:59:23 INFO CommitUtils: Creating metadata for UPSERT_PREPPED
numWriteStats:1numReplaceFileIds:0
22/03/20 21:59:23 INFO SparkContext: Starting job: collect at
BaseSparkCommitActionExecutor.java:283
22/03/20 21:59:23 INFO DAGScheduler: Got job 685 (collect at
BaseSparkCommitActionExecutor.java:283) with 1 output partitions
22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 993 (collect
at BaseSparkCommitActionExecutor.java:283)
22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 992)
22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 993
(MapPartitionsRDD[2138] at map at BaseSparkCommitActionExecutor.java:283),
which has no missing parents
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_856 stored as values in
memory (estimated size 435.7 KiB, free 356.4 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_856_piece0 stored as
bytes in memory (estimated size 156.4 KiB, free 356.3 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_856_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 156.4 KiB, free: 363.7 MiB)
22/03/20 21:59:23 INFO SparkContext: Created broadcast 856 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 993 (MapPartitionsRDD[2138] at map at
BaseSparkCommitActionExecutor.java:283) (first 15 tasks are for partitions
Vector(0))
22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 993.0 with 1 tasks
resource profile 0
22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 993.0 (TID
2274) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4271
bytes) taskResourceAssignments Map()
22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 993.0 (TID 2274)
22/03/20 21:59:23 INFO BlockManager: Found block rdd_2136_0 locally
22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 993.0 (TID
2274). 1248 bytes result sent to driver
22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 993.0 (TID
2274) in 20 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 993.0, whose tasks
have all completed, from pool
22/03/20 21:59:23 INFO DAGScheduler: ResultStage 993 (collect at
BaseSparkCommitActionExecutor.java:283) finished in 0.092 s
22/03/20 21:59:23 INFO DAGScheduler: Job 685 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage
993: Stage finished
22/03/20 21:59:23 INFO DAGScheduler: Job 685 finished: collect at
BaseSparkCommitActionExecutor.java:283, took 0.093105 s
22/03/20 21:59:23 INFO BaseSparkCommitActionExecutor: Committing
20220320215909174, action Type deltacommit, operation Type UPSERT_PREPPED
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_851_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 116.9 KiB, free: 363.8 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_856_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 156.4 KiB, free: 363.9 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_849_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 5.2 KiB, free: 363.9 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_854_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 119.4 KiB, free: 364.1 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_853_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 3.4 KiB, free: 364.1 MiB)
22/03/20 21:59:23 INFO SparkContext: Starting job: collect at
HoodieSparkEngineContext.java:134
22/03/20 21:59:23 INFO DAGScheduler: Got job 686 (collect at
HoodieSparkEngineContext.java:134) with 1 output partitions
22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 994 (collect
at HoodieSparkEngineContext.java:134)
22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List()
22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_855_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 156.4 KiB, free: 364.2 MiB)
22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 994
(MapPartitionsRDD[2140] at flatMap at HoodieSparkEngineContext.java:134), which
has no missing parents
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_852_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 5.1 KiB, free: 364.2 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Removed broadcast_850_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 3.1 KiB, free: 364.2 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_857 stored as values in
memory (estimated size 99.5 KiB, free 358.3 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_857_piece0 stored as
bytes in memory (estimated size 35.1 KiB, free 358.2 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_857_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 35.1 KiB, free: 364.2 MiB)
22/03/20 21:59:23 INFO SparkContext: Created broadcast 857 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 994 (MapPartitionsRDD[2140] at flatMap at
HoodieSparkEngineContext.java:134) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 994.0 with 1 tasks
resource profile 0
22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 994.0 (TID
2275) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4408
bytes) taskResourceAssignments Map()
22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 994.0 (TID 2275)
22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 994.0 (TID
2275). 796 bytes result sent to driver
22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 994.0 (TID
2275) in 15 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 994.0, whose tasks
have all completed, from pool
22/03/20 21:59:23 INFO DAGScheduler: ResultStage 994 (collect at
HoodieSparkEngineContext.java:134) finished in 0.036 s
22/03/20 21:59:23 INFO DAGScheduler: Job 686 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage
994: Stage finished
22/03/20 21:59:23 INFO DAGScheduler: Job 686 finished: collect at
HoodieSparkEngineContext.java:134, took 0.035978 s
22/03/20 21:59:23 INFO HoodieActiveTimeline: Marking instant complete
[==>20220320215909174__deltacommit__INFLIGHT]
22/03/20 21:59:23 INFO HoodieActiveTimeline: Checking for file exists
?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit.inflight
22/03/20 21:59:23 INFO HoodieActiveTimeline: Create new file for toInstant
?file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/20220320215909174.deltacommit
22/03/20 21:59:23 INFO HoodieActiveTimeline: Completed
[==>20220320215909174__deltacommit__INFLIGHT]
22/03/20 21:59:23 INFO BaseSparkCommitActionExecutor: Committed
20220320215909174
22/03/20 21:59:23 INFO SparkContext: Starting job: collectAsMap at
HoodieSparkEngineContext.java:148
22/03/20 21:59:23 INFO DAGScheduler: Got job 687 (collectAsMap at
HoodieSparkEngineContext.java:148) with 1 output partitions
22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 995
(collectAsMap at HoodieSparkEngineContext.java:148)
22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List()
22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 995
(MapPartitionsRDD[2142] at mapToPair at HoodieSparkEngineContext.java:145),
which has no missing parents
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_858 stored as values in
memory (estimated size 99.7 KiB, free 358.1 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_858_piece0 stored as
bytes in memory (estimated size 35.2 KiB, free 358.1 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_858_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 35.2 KiB, free: 364.1 MiB)
22/03/20 21:59:23 INFO SparkContext: Created broadcast 858 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 995 (MapPartitionsRDD[2142] at mapToPair at
HoodieSparkEngineContext.java:145) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 995.0 with 1 tasks
resource profile 0
22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 995.0 (TID
2276) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4408
bytes) taskResourceAssignments Map()
22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 995.0 (TID 2276)
22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 995.0 (TID
2276). 836 bytes result sent to driver
22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 995.0 (TID
2276) in 6 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 995.0, whose tasks
have all completed, from pool
22/03/20 21:59:23 INFO DAGScheduler: ResultStage 995 (collectAsMap at
HoodieSparkEngineContext.java:148) finished in 0.025 s
22/03/20 21:59:23 INFO DAGScheduler: Job 687 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage
995: Stage finished
22/03/20 21:59:23 INFO DAGScheduler: Job 687 finished: collectAsMap at
HoodieSparkEngineContext.java:148, took 0.026164 s
22/03/20 21:59:23 INFO FSUtils: Removed directory at
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/.temp/20220320215909174
22/03/20 21:59:23 INFO HoodieHeartbeatClient: Stopping heartbeat for instant
20220320215909174
22/03/20 21:59:23 INFO HoodieHeartbeatClient: Stopped heartbeat for instant
20220320215909174
22/03/20 21:59:23 INFO HeartbeatUtils: Deleted the heartbeat for instant
20220320215909174
22/03/20 21:59:23 INFO HoodieHeartbeatClient: Deleted heartbeat file for
instant 20220320215909174
22/03/20 21:59:23 INFO SparkContext: Starting job: collect at
SparkHoodieBackedTableMetadataWriter.java:154
22/03/20 21:59:23 INFO DAGScheduler: Got job 688 (collect at
SparkHoodieBackedTableMetadataWriter.java:154) with 1 output partitions
22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 997 (collect
at SparkHoodieBackedTableMetadataWriter.java:154)
22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 996)
22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 997
(MapPartitionsRDD[2136] at flatMap at BaseSparkCommitActionExecutor.java:175),
which has no missing parents
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_859 stored as values in
memory (estimated size 435.3 KiB, free 357.7 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_859_piece0 stored as
bytes in memory (estimated size 156.3 KiB, free 357.5 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_859_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 156.3 KiB, free: 364.0 MiB)
22/03/20 21:59:23 INFO SparkContext: Created broadcast 859 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 997 (MapPartitionsRDD[2136] at flatMap at
BaseSparkCommitActionExecutor.java:175) (first 15 tasks are for partitions
Vector(0))
22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 997.0 with 1 tasks
resource profile 0
22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 997.0 (TID
2277) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4271
bytes) taskResourceAssignments Map()
22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 997.0 (TID 2277)
22/03/20 21:59:23 INFO BlockManager: Found block rdd_2136_0 locally
22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 997.0 (TID
2277). 1328 bytes result sent to driver
22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 997.0 (TID
2277) in 20 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 997.0, whose tasks
have all completed, from pool
22/03/20 21:59:23 INFO DAGScheduler: ResultStage 997 (collect at
SparkHoodieBackedTableMetadataWriter.java:154) finished in 0.091 s
22/03/20 21:59:23 INFO DAGScheduler: Job 688 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage
997: Stage finished
22/03/20 21:59:23 INFO DAGScheduler: Job 688 finished: collect at
SparkHoodieBackedTableMetadataWriter.java:154, took 0.091996 s
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
22/03/20 21:59:23 INFO FileSystemViewManager: Creating in-memory based Table
View
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieTimelineArchiver: No Instants to archive
22/03/20 21:59:23 INFO HoodieActiveTimeline: Marking instant complete
[==>20220320215909174__commit__INFLIGHT]
22/03/20 21:59:23 INFO HoodieActiveTimeline: Checking for file exists
?file:/tmp/hudi_4635/.hoodie/20220320215909174.inflight
22/03/20 21:59:23 INFO HoodieActiveTimeline: Create new file for toInstant
?file:/tmp/hudi_4635/.hoodie/20220320215909174.commit
22/03/20 21:59:23 INFO HoodieActiveTimeline: Completed
[==>20220320215909174__commit__INFLIGHT]
22/03/20 21:59:23 INFO SparkContext: Starting job: collectAsMap at
HoodieSparkEngineContext.java:148
22/03/20 21:59:23 INFO DAGScheduler: Got job 689 (collectAsMap at
HoodieSparkEngineContext.java:148) with 1 output partitions
22/03/20 21:59:23 INFO DAGScheduler: Final stage: ResultStage 998
(collectAsMap at HoodieSparkEngineContext.java:148)
22/03/20 21:59:23 INFO DAGScheduler: Parents of final stage: List()
22/03/20 21:59:23 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:23 INFO DAGScheduler: Submitting ResultStage 998
(MapPartitionsRDD[2144] at mapToPair at HoodieSparkEngineContext.java:145),
which has no missing parents
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_860 stored as values in
memory (estimated size 99.7 KiB, free 357.4 MiB)
22/03/20 21:59:23 INFO MemoryStore: Block broadcast_860_piece0 stored as
bytes in memory (estimated size 35.2 KiB, free 357.4 MiB)
22/03/20 21:59:23 INFO BlockManagerInfo: Added broadcast_860_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 35.2 KiB, free: 364.0 MiB)
22/03/20 21:59:23 INFO SparkContext: Created broadcast 860 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:23 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 998 (MapPartitionsRDD[2144] at mapToPair at
HoodieSparkEngineContext.java:145) (first 15 tasks are for partitions Vector(0))
22/03/20 21:59:23 INFO TaskSchedulerImpl: Adding task set 998.0 with 1 tasks
resource profile 0
22/03/20 21:59:23 INFO TaskSetManager: Starting task 0.0 in stage 998.0 (TID
2278) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4387
bytes) taskResourceAssignments Map()
22/03/20 21:59:23 INFO Executor: Running task 0.0 in stage 998.0 (TID 2278)
22/03/20 21:59:23 INFO Executor: Finished task 0.0 in stage 998.0 (TID
2278). 858 bytes result sent to driver
22/03/20 21:59:23 INFO TaskSetManager: Finished task 0.0 in stage 998.0 (TID
2278) in 7 ms on rkalluri.attlocal.net (executor driver) (1/1)
22/03/20 21:59:23 INFO TaskSchedulerImpl: Removed TaskSet 998.0, whose tasks
have all completed, from pool
22/03/20 21:59:23 INFO DAGScheduler: ResultStage 998 (collectAsMap at
HoodieSparkEngineContext.java:148) finished in 0.026 s
22/03/20 21:59:23 INFO DAGScheduler: Job 689 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:23 INFO TaskSchedulerImpl: Killing all running tasks in stage
998: Stage finished
22/03/20 21:59:23 INFO DAGScheduler: Job 689 finished: collectAsMap at
HoodieSparkEngineContext.java:148, took 0.026346 s
22/03/20 21:59:23 INFO FSUtils: Removed directory at
file:/tmp/hudi_4635/.hoodie/.temp/20220320215909174
22/03/20 21:59:23 INFO BaseHoodieWriteClient: Start to clean synchronously.
22/03/20 21:59:23 INFO CleanerUtils: Cleaned failed attempts if any
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with
storage type :REMOTE_FIRST
22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote first table
view
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with
storage type :REMOTE_FIRST
22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote first table
view
22/03/20 21:59:23 INFO BaseHoodieWriteClient: Cleaner started
22/03/20 21:59:23 INFO BaseHoodieWriteClient: Scheduling cleaning at instant
time :20220320215923917
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading Active commit timeline
for file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO FileSystemViewManager: Creating View Manager with
storage type :REMOTE_FIRST
22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote first table
view
22/03/20 21:59:23 INFO FileSystemViewManager: Creating remote view for
basePath file:/tmp/hudi_4635. Server=rkalluri.attlocal.net:63594, Timeout=300
22/03/20 21:59:23 INFO FileSystemViewManager: Creating InMemory based view
for basePath file:/tmp/hudi_4635
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/refresh/?basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/compactions/pending/?basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:/tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:/tmp/hudi_4635
22/03/20 21:59:23 INFO FileSystemViewManager: Creating InMemory based view
for basePath file:/tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:23 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:23 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:23 INFO HoodieTableMetadataUtil: Loading latest merged file
slices for metadata table partition files
22/03/20 21:59:23 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:23 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:23 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:23 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=0
22/03/20 21:59:23 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:23 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:23 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:23 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:23 INFO HoodieBackedTableMetadata: Opened metadata base file
from
file:/tmp/hudi_4635/.hoodie/metadata/files/files-0000_0-966-2246_20220320215907162001.hfile
at instant 20220320215907162001 in 1 ms
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255',
fileLen=-1}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block
from file
file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255
at instant 20220320215908164
22/03/20 21:59:24 INFO HoodieLogFormatReader: Moving to the next reader for
logfile
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273',
fileLen=-1}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273',
fileLen=-1}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block
from file
file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273
at instant 20220320215909174
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining
logblocks to merge 1
22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO ExternalSpillableMap: Estimated Payload size => 616
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Merging the final data
blocks
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining
logblocks to merge 1
22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of log files
scanned => 2
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: MaxMemoryInBytes
allowed for compaction => 1073741824
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in
MemoryBasedMap in ExternalSpillableMap => 3
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Total size in bytes of
MemoryBasedMap in ExternalSpillableMap => 1848
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in
BitCaskDiskMap in ExternalSpillableMap => 0
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Size of file spilled to
disk => 0
22/03/20 21:59:24 INFO HoodieBackedTableMetadata: Opened 2 metadata log
files (dataset instant=20220320215909174, metadata instant=20220320215909174)
in 35 ms
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO BaseTableMetadata: Listed partitions from metadata:
#partitions=5
22/03/20 21:59:24 INFO CleanPlanner: Total Partitions to clean : 5, with
policy KEEP_LATEST_COMMITS
22/03/20 21:59:24 INFO CleanPlanner: Using cleanerParallelism: 5
22/03/20 21:59:24 INFO SparkContext: Starting job: collect at
HoodieSparkEngineContext.java:100
22/03/20 21:59:24 INFO DAGScheduler: Got job 690 (collect at
HoodieSparkEngineContext.java:100) with 5 output partitions
22/03/20 21:59:24 INFO DAGScheduler: Final stage: ResultStage 999 (collect
at HoodieSparkEngineContext.java:100)
22/03/20 21:59:24 INFO DAGScheduler: Parents of final stage: List()
22/03/20 21:59:24 INFO DAGScheduler: Missing parents: List()
22/03/20 21:59:24 INFO DAGScheduler: Submitting ResultStage 999
(MapPartitionsRDD[2146] at map at HoodieSparkEngineContext.java:100), which has
no missing parents
22/03/20 21:59:24 INFO MemoryStore: Block broadcast_861 stored as values in
memory (estimated size 556.0 KiB, free 356.8 MiB)
22/03/20 21:59:24 INFO MemoryStore: Block broadcast_861_piece0 stored as
bytes in memory (estimated size 196.7 KiB, free 356.7 MiB)
22/03/20 21:59:24 INFO BlockManagerInfo: Added broadcast_861_piece0 in
memory on rkalluri.attlocal.net:63252 (size: 196.7 KiB, free: 363.8 MiB)
22/03/20 21:59:24 INFO SparkContext: Created broadcast 861 from broadcast at
DAGScheduler.scala:1478
22/03/20 21:59:24 INFO DAGScheduler: Submitting 5 missing tasks from
ResultStage 999 (MapPartitionsRDD[2146] at map at
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0,
1, 2, 3, 4))
22/03/20 21:59:24 INFO TaskSchedulerImpl: Adding task set 999.0 with 5 tasks
resource profile 0
22/03/20 21:59:24 INFO TaskSetManager: Starting task 0.0 in stage 999.0 (TID
2279) (rkalluri.attlocal.net, executor driver, partition 0, PROCESS_LOCAL, 4344
bytes) taskResourceAssignments Map()
22/03/20 21:59:24 INFO TaskSetManager: Starting task 1.0 in stage 999.0 (TID
2280) (rkalluri.attlocal.net, executor driver, partition 1, PROCESS_LOCAL, 4344
bytes) taskResourceAssignments Map()
22/03/20 21:59:24 INFO TaskSetManager: Starting task 2.0 in stage 999.0 (TID
2281) (rkalluri.attlocal.net, executor driver, partition 2, PROCESS_LOCAL, 4344
bytes) taskResourceAssignments Map()
22/03/20 21:59:24 INFO TaskSetManager: Starting task 3.0 in stage 999.0 (TID
2282) (rkalluri.attlocal.net, executor driver, partition 3, PROCESS_LOCAL, 4344
bytes) taskResourceAssignments Map()
22/03/20 21:59:24 INFO TaskSetManager: Starting task 4.0 in stage 999.0 (TID
2283) (rkalluri.attlocal.net, executor driver, partition 4, PROCESS_LOCAL, 4344
bytes) taskResourceAssignments Map()
22/03/20 21:59:24 INFO Executor: Running task 2.0 in stage 999.0 (TID 2281)
22/03/20 21:59:24 INFO Executor: Running task 0.0 in stage 999.0 (TID 2279)
22/03/20 21:59:24 INFO Executor: Running task 1.0 in stage 999.0 (TID 2280)
22/03/20 21:59:24 INFO Executor: Running task 4.0 in stage 999.0 (TID 2283)
22/03/20 21:59:24 INFO Executor: Running task 3.0 in stage 999.0 (TID 2282)
22/03/20 21:59:24 INFO CleanPlanner: Cleaning HEF/20211215, retaining latest
10 commits.
22/03/20 21:59:24 INFO CleanPlanner: Cleaning DEF/20211215, retaining latest
10 commits.
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=HEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=DEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (HEF/20211215)
22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file
slices for metadata table partition files
22/03/20 21:59:24 INFO CleanPlanner: Cleaning GEF/20211215, retaining latest
10 commits.
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=GEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (DEF/20211215)
22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file
slices for metadata table partition files
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (GEF/20211215)
22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file
slices for metadata table partition files
22/03/20 21:59:24 INFO CleanPlanner: Cleaning EEF/20211215, retaining latest
10 commits.
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=EEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (EEF/20211215)
22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file
slices for metadata table partition files
22/03/20 21:59:24 INFO CleanPlanner: Cleaning FEF/20211215, retaining latest
10 commits.
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/replaced/before/?partition=FEF%2F20211215&maxinstant=20220320215846736&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (FEF/20211215)
22/03/20 21:59:24 INFO HoodieTableMetadataUtil: Loading latest merged file
slices for metadata table partition files
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 1 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 1 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 1 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Took 0 ms to read 0
instants, 0 replaced file groups
22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:24 INFO ClusteringUtils: Found 0 files in pending clustering
operations
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: Building file system
view for partition (files)
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO HoodieBackedTableMetadata: Opened metadata base file
from
file:/tmp/hudi_4635/.hoodie/metadata/files/files-0000_0-966-2246_20220320215907162001.hfile
at instant 20220320215907162001 in 1 ms
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=15, NumFileGroups=1, FileGroupsCreationTime=3, StoreTimeTaken=0
22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_859_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 156.3 KiB, free: 363.9 MiB)
22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_857_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 35.1 KiB, free: 364.0 MiB)
22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_860_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 35.2 KiB, free: 364.0 MiB)
22/03/20 21:59:24 INFO BlockManagerInfo: Removed broadcast_858_piece0 on
rkalluri.attlocal.net:63252 in memory (size: 35.2 KiB, free: 364.0 MiB)
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255',
fileLen=-1}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block
from file
file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.1_0-975-2255
at instant 20220320215908164
22/03/20 21:59:24 INFO HoodieLogFormatReader: Moving to the next reader for
logfile
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273',
fileLen=-1}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Scanning log file
HoodieLogFile{pathStr='file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273',
fileLen=-1}
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Reading a data block
from file
file:/tmp/hudi_4635/.hoodie/metadata/files/.files-0000_20220320215907162001.log.2_0-991-2273
at instant 20220320215909174
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining
logblocks to merge 1
22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO ExternalSpillableMap: Estimated Payload size => 616
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Merging the final data
blocks
22/03/20 21:59:24 INFO AbstractHoodieLogRecordReader: Number of remaining
logblocks to merge 1
22/03/20 21:59:24 INFO CacheConfig: Created cacheConfig:
blockCache=LruBlockCache{blockCount=0, currentSize=392960, freeSize=381498432,
maxSize=381891392, heapSize=392960, minSize=362796832, minFactor=0.95,
multiSize=181398416, multiFactor=0.5, singleSize=90699208, singleFactor=0.25},
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false,
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of log files
scanned => 2
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: MaxMemoryInBytes
allowed for compaction => 1073741824
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in
MemoryBasedMap in ExternalSpillableMap => 3
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Total size in bytes of
MemoryBasedMap in ExternalSpillableMap => 1848
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Number of entries in
BitCaskDiskMap in ExternalSpillableMap => 0
22/03/20 21:59:24 INFO HoodieMergedLogRecordScanner: Size of file spilled to
disk => 0
22/03/20 21:59:24 INFO HoodieBackedTableMetadata: Opened 2 metadata log
files (dataset instant=20220320215909174, metadata instant=20220320215909174)
in 36 ms
22/03/20 21:59:24 INFO CodecPool: Got brand-new decompressor [.gz]
22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from
metadata: partition=GEF/20211215, #files=9
22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from
metadata: partition=HEF/20211215, #files=9
22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from
metadata: partition=FEF/20211215, #files=9
22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from
metadata: partition=EEF/20211215, #files=9
22/03/20 21:59:24 INFO BaseTableMetadata: Listed file in partition from
metadata: partition=DEF/20211215, #files=10
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=0, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=0, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=1, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=9, NumFileGroups=9, FileGroupsCreationTime=1, StoreTimeTaken=0
22/03/20 21:59:24 INFO AbstractTableFileSystemView: addFilesToView:
NumFiles=10, NumFileGroups=10, FileGroupsCreationTime=1, StoreTimeTaken=0
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=GEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=FEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=DEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=EEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO RemoteHoodieTableFileSystemView: Sending request :
(http://rkalluri.attlocal.net:63594/v1/hoodie/view/filegroups/all/partition/?partition=HEF%2F20211215&basepath=file%3A%2Ftmp%2Fhudi_4635&lastinstantts=20220320215909174&timelinehash=6d633da951dc97b80f9b1ab40bb28007857d183169e822cc8b2d05907b903876)
22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition
path:DEF/20211215
22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition
path:GEF/20211215
22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition
path:EEF/20211215
22/03/20 21:59:24 INFO Executor: Finished task 0.0 in stage 999.0 (TID
2279). 931 bytes result sent to driver
22/03/20 21:59:24 INFO Executor: Finished task 3.0 in stage 999.0 (TID
2282). 931 bytes result sent to driver
22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition
path:HEF/20211215
22/03/20 21:59:24 INFO Executor: Finished task 1.0 in stage 999.0 (TID
2280). 931 bytes result sent to driver
22/03/20 21:59:24 INFO CleanPlanner: 0 patterns used to delete in partition
path:FEF/20211215
22/03/20 21:59:24 INFO Executor: Finished task 4.0 in stage 999.0 (TID
2283). 931 bytes result sent to driver
22/03/20 21:59:24 INFO TaskSetManager: Finished task 0.0 in stage 999.0 (TID
2279) in 125 ms on rkalluri.attlocal.net (executor driver) (1/5)
22/03/20 21:59:24 INFO TaskSetManager: Finished task 3.0 in stage 999.0 (TID
2282) in 125 ms on rkalluri.attlocal.net (executor driver) (2/5)
22/03/20 21:59:24 INFO Executor: Finished task 2.0 in stage 999.0 (TID
2281). 931 bytes result sent to driver
22/03/20 21:59:24 INFO TaskSetManager: Finished task 1.0 in stage 999.0 (TID
2280) in 125 ms on rkalluri.attlocal.net (executor driver) (3/5)
22/03/20 21:59:24 INFO TaskSetManager: Finished task 4.0 in stage 999.0 (TID
2283) in 125 ms on rkalluri.attlocal.net (executor driver) (4/5)
22/03/20 21:59:24 INFO TaskSetManager: Finished task 2.0 in stage 999.0 (TID
2281) in 125 ms on rkalluri.attlocal.net (executor driver) (5/5)
22/03/20 21:59:24 INFO TaskSchedulerImpl: Removed TaskSet 999.0, whose tasks
have all completed, from pool
22/03/20 21:59:24 INFO DAGScheduler: ResultStage 999 (collect at
HoodieSparkEngineContext.java:100) finished in 0.213 s
22/03/20 21:59:24 INFO DAGScheduler: Job 690 is finished. Cancelling
potential speculative or zombie tasks for this job
22/03/20 21:59:24 INFO TaskSchedulerImpl: Killing all running tasks in stage
999: Stage finished
22/03/20 21:59:24 INFO DAGScheduler: Job 690 finished: collect at
HoodieSparkEngineContext.java:100, took 0.213358 s
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:24 INFO BaseHoodieWriteClient: Start to archive synchronously.
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__commit__COMPLETED]}
22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635
22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/hoodie.properties
22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type
COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hudi_4635
22/03/20 21:59:24 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient
from file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:24 INFO HoodieTableConfig: Loading table properties from
file:/tmp/hudi_4635/.hoodie/metadata/.hoodie/hoodie.properties
22/03/20 21:59:24 INFO HoodieTableMetaClient: Finished Loading Table of type
MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
file:///tmp/hudi_4635/.hoodie/metadata
22/03/20 21:59:24 INFO HoodieActiveTimeline: Loaded instants upto :
Option{val=[20220320215909174__deltacommit__COMPLETED]}
22/03/20 21:59:24 INFO HoodieTimelineArchiver: Limiting archiving of
instants to latest compaction on metadata table at 20220320215907162001
22/03/20 21:59:24 INFO HoodieHeartbeatClient: Stopping heartbeat for instant
20220320215909174
22/03/20 21:59:24 INFO HoodieHeartbeatClient: Stopped heartbeat for instant
20220320215909174
22/03/20 21:59:24 INFO HeartbeatUtils: Deleted the heartbeat for instant
20220320215909174
22/03/20 21:59:24 INFO HoodieHeartbeatClient: Deleted heartbeat file for
instant 20220320215909174
22/03/20 21:59:24 INFO TransactionManager: Transaction ending with
transaction owner Option{val=[==>20220320215909174__commit__INFLIGHT]}
22/03/20 21:59:24 INFO ZookeeperBasedLockProvider: RELEASING lock
atZkBasePath = /hudi, lock key = None
22/03/20 21:59:24 INFO ZookeeperBasedLockProvider: RELEASED lock
atZkBasePath = /hudi, lock key = None
22/03/20 21:59:24 INFO TransactionManager: Transaction ended with
transaction owner Option{val=[==>20220320215909174__commit__INFLIGHT]}
An error occurred while calling o1843.save.
: java.lang.NullPointerException
at
org.apache.hudi.client.HoodieTimelineArchiver.lambda$getInstantsToArchive$10(HoodieTimelineArchiver.java:452)
at
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.SliceOps$1$1.accept(SliceOps.java:204)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1351)
at
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
at
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at
java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
at
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at
org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:147)
at
org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:818)
at
org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:572)
at
org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:477)
at
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:212)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:119)
at
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:667)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:299)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
at
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382)
at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
at sun.reflect.GeneratedMethodAccessor224.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:745)
22/03/20 21:59:24 INFO SparkContext: Invoking stop() from shutdown hook
22/03/20 21:59:24 INFO SparkUI: Stopped Spark web UI at
http://rkalluri.attlocal.net:4040
22/03/20 21:59:24 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
22/03/20 21:59:24 INFO MemoryStore: MemoryStore cleared
22/03/20 21:59:24 INFO BlockManager: BlockManager stopped
22/03/20 21:59:24 INFO BlockManagerMaster: BlockManagerMaster stopped
22/03/20 21:59:24 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
22/03/20 21:59:24 INFO SparkContext: Successfully stopped SparkContext
22/03/20 21:59:24 INFO ShutdownHookManager: Shutdown hook called
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]