rshanmugam1 opened a new issue, #7244:
URL: https://github.com/apache/hudi/issues/7244

   this is my dbt model. if i run twice that creates duplicates. am i missing 
any obvious configurations?.
   
   ```
   {{ config(
       materialized = 'incremental',
       incremental_strategy = 'merge',
       file_format = 'hudi',
       options={
         'type': 'cow',
         'primaryKey': 'id',
       },
       unique_key = 'id',
   ) }}
   {% if not is_incremental() %}
   
   select cast(1 as bigint) as id, 'yo' as msg
   union all
   select  cast(2 as bigint) as id, 'anyway' as msg
   union all
   select  cast(3 as bigint) as id, 'yo' as msg
   
   {% else %}
   
   select  cast(1 as bigint) as id, 'yo2' as msg
   union all
   select cast(2 as bigint) as id, 'anyway1' as msg
   union all
   select  cast(3 as bigint) as id, 'yo3' as msg
   
   {% endif %}
   
   ```
   
   <img width="1257" alt="Screen Shot 2022-11-18 at 3 38 33 PM" 
src="https://user-images.githubusercontent.com/42749351/202820863-67721eb4-8c4d-4736-9fae-966f5dbf51c4.png";>
   
   
   A clear and concise description of the problem.
   dbt merge model product duplicates.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. run the model twice.
   
   **Expected behavior**
   duplicates should not present
   
   **Environment Description**
   
   * Hudi version :  0.10.1
   
   * Spark version : 3.2.0
   
   * Hive version : 2.3.3
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   it is EMR emr-6.6.0 and running thrift server on it.
   
   ```
   /usr/lib/spark/sbin/start-thriftserver.sh --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 
'spark.sql.hive.convertMetastoreParquet=false'  --conf 
'spark.jars=hdfs:///apps/hudi/lib/hudi-spark-bundle.jar' --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   ```
   
   table properties
   ```
   hoodie.table.partition.fields=
   hoodie.table.type=COPY_ON_WRITE
   hoodie.archivelog.folder=archived
   hoodie.timeline.layout.version=1
   hoodie.table.version=3
   hoodie.table.recordkey.fields=id
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.table.name=test_merge_2
   hoodie.datasource.write.hive_style_partitioning=true
   
hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:"long"},{"name"\:"msg","type"\:"string"}]}
   
   ```
   
   Create Query
   ```
         create table dbt_tmp.test_merge_2
         using hudi
       options (type "cow" , primaryKey "id" 
       )
       as
    select cast(1 as bigint) as id, 'yo' as msg
   union all
   select  cast(2 as bigint) as id, 'anyway' as msg
   union all
   select  cast(3 as bigint) as id, 'yo' as msg
   ```
   
   Merge second run
   ```22/11/18 23:35:18 INFO SparkExecuteStatementOperation: Submitting query 
'/* {"app": "dbt", "dbt_version": "1.1.0", "profile_name": "spark_emr", 
"target_name": "prod_test", "node_id": "model.transform_spark.test_merge_2"} */
   
       create temporary view test_merge_2__dbt_tmp as
       
   
   
   select  cast(1 as bigint) as id, 'yo2' as msg
   union all
   select cast(2 as bigint) as id, 'anyway1' as msg
   union all
   select  cast(3 as bigint) as id, 'yo3' as msg
   
   
   
     ' with 897bf6f7-352f-4762-a135-e693c133c5bc
   22/11/18 23:35:18 INFO SparkExecuteStatementOperation: Running query with 
897bf6f7-352f-4762-a135-e693c133c5bc
   22/11/18 23:35:18 INFO SparkExecuteStatementOperation: Received 
getNextRowSet request order=FETCH_NEXT and maxRowsL=1000 with 
897bf6f7-352f-4762-a135-e693c133c5bc
   22/11/18 23:35:19 INFO SparkExecuteStatementOperation: Submitting query '/* 
{"app": "dbt", "dbt_version": "1.1.0", "profile_name": "spark_emr", 
"target_name": "prod_test", "node_id": "model.transform_spark.test_merge_2"
   merge into dbt_tmp.test_merge_2 as DBT_INTERNAL_DEST
         using test_merge_2__dbt_tmp as DBT_INTERNAL_SOURCE
         on 
                 DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
             
         
         when matched then update set
            * 
       
         when not matched then insert *
   ' with bda690e6-a81b-4cc2-830a-0654e4f43f14
   22/11/18 23:35:19 INFO SparkExecuteStatementOperation: Running query with 
bda690e6-a81b-4cc2-830a-0654e4f43f14
   22/11/18 23:35:19 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:19 INFO DAGScheduler: Got job 165 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:19 INFO DAGScheduler: Final stage: ResultStage 216 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:19 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:19 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:19 INFO DAGScheduler: Submitting ResultStage 216 
(MapPartitionsRDD[562] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:19 INFO MemoryStore: Block broadcast_192 stored as values in 
memory (estimated size 96.2 KiB, free 1048.7 MiB)
   22/11/18 23:35:19 INFO MemoryStore: Block broadcast_192_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.7 MiB)
   22/11/18 23:35:19 INFO BlockManagerInfo: Added broadcast_192_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:19 INFO SparkContext: Created broadcast 192 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:19 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 216 (MapPartitionsRDD[562] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:19 INFO YarnScheduler: Adding task set 216.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:19 INFO TaskSetManager: Starting task 0.0 in stage 216.0 (TID 
330) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4417 bytes) taskResourceAssignments Map()
   22/11/18 23:35:19 INFO BlockManagerInfo: Added broadcast_192_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:19 INFO TaskSetManager: Finished task 0.0 in stage 216.0 (TID 
330) in 99 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:19 INFO YarnScheduler: Removed TaskSet 216.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:19 INFO DAGScheduler: ResultStage 216 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.109 s
   22/11/18 23:35:19 INFO DAGScheduler: Job 165 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:19 INFO YarnScheduler: Killing all running tasks in stage 
216: Stage finished
   22/11/18 23:35:19 INFO DAGScheduler: Job 165 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.109971 s
   22/11/18 23:35:19 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:19 INFO DAGScheduler: Got job 166 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:19 INFO DAGScheduler: Final stage: ResultStage 217 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:19 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:19 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:19 INFO DAGScheduler: Submitting ResultStage 217 
(MapPartitionsRDD[564] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:19 INFO MemoryStore: Block broadcast_193 stored as values in 
memory (estimated size 96.2 KiB, free 1048.6 MiB)
   22/11/18 23:35:19 INFO MemoryStore: Block broadcast_193_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.5 MiB)
   22/11/18 23:35:19 INFO BlockManagerInfo: Added broadcast_193_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:19 INFO SparkContext: Created broadcast 193 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:19 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 217 (MapPartitionsRDD[564] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:19 INFO YarnScheduler: Adding task set 217.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:19 INFO TaskSetManager: Starting task 0.0 in stage 217.0 (TID 
331) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4390 bytes) taskResourceAssignments Map()
   22/11/18 23:35:19 INFO BlockManagerInfo: Added broadcast_193_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:19 INFO TaskSetManager: Finished task 0.0 in stage 217.0 (TID 
331) in 84 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:19 INFO YarnScheduler: Removed TaskSet 217.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:19 INFO DAGScheduler: ResultStage 217 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.092 s
   22/11/18 23:35:19 INFO DAGScheduler: Job 166 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:19 INFO YarnScheduler: Killing all running tasks in stage 
217: Stage finished
   22/11/18 23:35:19 INFO DAGScheduler: Job 166 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.093236 s
   22/11/18 23:35:20 INFO Javalin: 
              __                      __ _
             / /____ _ _   __ ____ _ / /(_)____
        __  / // __ `/| | / // __ `// // // __ \
       / /_/ // /_/ / | |/ // /_/ // // // / / /
       \____/ \__,_/  |___/ \__,_//_//_//_/ /_/
   
           https://javalin.io/documentation
   
   22/11/18 23:35:20 INFO Javalin: Starting Javalin ...
   22/11/18 23:35:20 INFO Javalin: Listening on http://localhost:44599/
   22/11/18 23:35:20 INFO Javalin: Javalin started in 5ms \o/
   22/11/18 23:35:20 INFO SparkContext: Starting job: countByKey at 
BaseSparkCommitActionExecutor.java:193
   22/11/18 23:35:20 INFO DAGScheduler: Registering RDD 575 (countByKey at 
BaseSparkCommitActionExecutor.java:193) as input to shuffle 27
   22/11/18 23:35:20 INFO DAGScheduler: Got job 167 (countByKey at 
BaseSparkCommitActionExecutor.java:193) with 3 output partitions
   22/11/18 23:35:20 INFO DAGScheduler: Final stage: ResultStage 219 
(countByKey at BaseSparkCommitActionExecutor.java:193)
   22/11/18 23:35:20 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 218)
   22/11/18 23:35:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 
218)
   22/11/18 23:35:20 INFO DAGScheduler: Submitting ShuffleMapStage 218 
(MapPartitionsRDD[575] at countByKey at 
BaseSparkCommitActionExecutor.java:193), which has no missing parents
   22/11/18 23:35:20 INFO MemoryStore: Block broadcast_194 stored as values in 
memory (estimated size 45.8 KiB, free 1048.5 MiB)
   22/11/18 23:35:20 INFO MemoryStore: Block broadcast_194_piece0 stored as 
bytes in memory (estimated size 19.6 KiB, free 1048.5 MiB)
   22/11/18 23:35:20 INFO BlockManagerInfo: Added broadcast_194_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 19.6 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:20 INFO SparkContext: Created broadcast 194 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:20 INFO DAGScheduler: Submitting 3 missing tasks from 
ShuffleMapStage 218 (MapPartitionsRDD[575] at countByKey at 
BaseSparkCommitActionExecutor.java:193) (first 15 tasks are for partitions 
Vector(0, 1, 2))
   22/11/18 23:35:20 INFO YarnScheduler: Adding task set 218.0 with 3 tasks 
resource profile 0
   22/11/18 23:35:20 INFO TaskSetManager: Starting task 0.0 in stage 218.0 (TID 
332) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:20 INFO TaskSetManager: Starting task 1.0 in stage 218.0 (TID 
333) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 1, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:20 INFO TaskSetManager: Starting task 2.0 in stage 218.0 (TID 
334) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 2, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:20 INFO BlockManagerInfo: Added broadcast_194_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 19.6 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:20 INFO BlockManagerInfo: Added rdd_573_2 in memory on 
ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 170.0 B, free: 10.9 
GiB)
   22/11/18 23:35:20 INFO BlockManagerInfo: Added rdd_573_1 in memory on 
ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 174.0 B, free: 10.9 
GiB)
   22/11/18 23:35:20 INFO BlockManagerInfo: Added rdd_573_0 in memory on 
ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 170.0 B, free: 10.9 
GiB)
   22/11/18 23:35:20 INFO TaskSetManager: Finished task 0.0 in stage 218.0 (TID 
332) in 53 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/3)
   22/11/18 23:35:20 INFO TaskSetManager: Finished task 1.0 in stage 218.0 (TID 
333) in 53 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (2/3)
   22/11/18 23:35:20 INFO TaskSetManager: Finished task 2.0 in stage 218.0 (TID 
334) in 54 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (3/3)
   22/11/18 23:35:20 INFO YarnScheduler: Removed TaskSet 218.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:20 INFO DAGScheduler: ShuffleMapStage 218 (countByKey at 
BaseSparkCommitActionExecutor.java:193) finished in 0.057 s
   22/11/18 23:35:20 INFO DAGScheduler: looking for newly runnable stages
   22/11/18 23:35:20 INFO DAGScheduler: running: Set()
   22/11/18 23:35:20 INFO DAGScheduler: waiting: Set(ResultStage 219)
   22/11/18 23:35:20 INFO DAGScheduler: failed: Set()
   22/11/18 23:35:20 INFO DAGScheduler: Submitting ResultStage 219 
(ShuffledRDD[576] at countByKey at BaseSparkCommitActionExecutor.java:193), 
which has no missing parents
   22/11/18 23:35:20 INFO MemoryStore: Block broadcast_195 stored as values in 
memory (estimated size 5.7 KiB, free 1048.5 MiB)
   22/11/18 23:35:20 INFO MemoryStore: Block broadcast_195_piece0 stored as 
bytes in memory (estimated size 3.3 KiB, free 1048.5 MiB)
   22/11/18 23:35:20 INFO BlockManagerInfo: Added broadcast_195_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 3.3 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:20 INFO SparkContext: Created broadcast 195 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:20 INFO DAGScheduler: Submitting 3 missing tasks from 
ResultStage 219 (ShuffledRDD[576] at countByKey at 
BaseSparkCommitActionExecutor.java:193) (first 15 tasks are for partitions 
Vector(0, 1, 2))
   22/11/18 23:35:20 INFO YarnScheduler: Adding task set 219.0 with 3 tasks 
resource profile 0
   22/11/18 23:35:20 INFO TaskSetManager: Starting task 1.0 in stage 219.0 (TID 
335) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 1, 
NODE_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:20 INFO TaskSetManager: Starting task 0.0 in stage 219.0 (TID 
336) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:20 INFO TaskSetManager: Starting task 2.0 in stage 219.0 (TID 
337) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 2, 
PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:20 INFO BlockManagerInfo: Added broadcast_195_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 3.3 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 27 to 172.31.22.163:53582
   22/11/18 23:35:20 INFO TaskSetManager: Finished task 0.0 in stage 219.0 (TID 
336) in 20 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/3)
   22/11/18 23:35:20 INFO TaskSetManager: Finished task 1.0 in stage 219.0 (TID 
335) in 20 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (2/3)
   22/11/18 23:35:20 INFO TaskSetManager: Finished task 2.0 in stage 219.0 (TID 
337) in 20 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (3/3)
   22/11/18 23:35:20 INFO YarnScheduler: Removed TaskSet 219.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:20 INFO DAGScheduler: ResultStage 219 (countByKey at 
BaseSparkCommitActionExecutor.java:193) finished in 0.022 s
   22/11/18 23:35:20 INFO DAGScheduler: Job 167 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:20 INFO YarnScheduler: Killing all running tasks in stage 
219: Stage finished
   22/11/18 23:35:20 INFO DAGScheduler: Job 167 finished: countByKey at 
BaseSparkCommitActionExecutor.java:193, took 0.081495 s
   22/11/18 23:35:21 INFO SparkContext: Starting job: collect at 
SparkRejectUpdateStrategy.java:52
   22/11/18 23:35:21 INFO DAGScheduler: Registering RDD 579 (distinct at 
SparkRejectUpdateStrategy.java:52) as input to shuffle 28
   22/11/18 23:35:21 INFO DAGScheduler: Got job 168 (collect at 
SparkRejectUpdateStrategy.java:52) with 3 output partitions
   22/11/18 23:35:21 INFO DAGScheduler: Final stage: ResultStage 221 (collect 
at SparkRejectUpdateStrategy.java:52)
   22/11/18 23:35:21 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 220)
   22/11/18 23:35:21 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 
220)
   22/11/18 23:35:21 INFO DAGScheduler: Submitting ShuffleMapStage 220 
(MapPartitionsRDD[579] at distinct at SparkRejectUpdateStrategy.java:52), which 
has no missing parents
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_196 stored as values in 
memory (estimated size 46.1 KiB, free 1048.4 MiB)
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_196_piece0 stored as 
bytes in memory (estimated size 19.7 KiB, free 1048.4 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_196_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 19.7 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:21 INFO SparkContext: Created broadcast 196 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:21 INFO DAGScheduler: Submitting 3 missing tasks from 
ShuffleMapStage 220 (MapPartitionsRDD[579] at distinct at 
SparkRejectUpdateStrategy.java:52) (first 15 tasks are for partitions Vector(0, 
1, 2))
   22/11/18 23:35:21 INFO YarnScheduler: Adding task set 220.0 with 3 tasks 
resource profile 0
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 0.0 in stage 220.0 (TID 
338) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 1.0 in stage 220.0 (TID 
339) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 1, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 2.0 in stage 220.0 (TID 
340) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 2, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_196_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 19.7 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 0.0 in stage 220.0 (TID 
338) in 19 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/3)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 1.0 in stage 220.0 (TID 
339) in 19 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (2/3)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 2.0 in stage 220.0 (TID 
340) in 19 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (3/3)
   22/11/18 23:35:21 INFO YarnScheduler: Removed TaskSet 220.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:21 INFO DAGScheduler: ShuffleMapStage 220 (distinct at 
SparkRejectUpdateStrategy.java:52) finished in 0.023 s
   22/11/18 23:35:21 INFO DAGScheduler: looking for newly runnable stages
   22/11/18 23:35:21 INFO DAGScheduler: running: Set()
   22/11/18 23:35:21 INFO DAGScheduler: waiting: Set(ResultStage 221)
   22/11/18 23:35:21 INFO DAGScheduler: failed: Set()
   22/11/18 23:35:21 INFO DAGScheduler: Submitting ResultStage 221 
(MapPartitionsRDD[581] at distinct at SparkRejectUpdateStrategy.java:52), which 
has no missing parents
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_197 stored as values in 
memory (estimated size 6.5 KiB, free 1048.4 MiB)
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_197_piece0 stored as 
bytes in memory (estimated size 3.6 KiB, free 1048.4 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_197_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 3.6 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:21 INFO SparkContext: Created broadcast 197 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:21 INFO DAGScheduler: Submitting 3 missing tasks from 
ResultStage 221 (MapPartitionsRDD[581] at distinct at 
SparkRejectUpdateStrategy.java:52) (first 15 tasks are for partitions Vector(0, 
1, 2))
   22/11/18 23:35:21 INFO YarnScheduler: Adding task set 221.0 with 3 tasks 
resource profile 0
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 0.0 in stage 221.0 (TID 
341) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 1.0 in stage 221.0 (TID 
342) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 1, 
PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 2.0 in stage 221.0 (TID 
343) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 2, 
PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_197_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 3.6 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 28 to 172.31.22.163:53582
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_193_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 35.5 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 0.0 in stage 221.0 (TID 
341) in 25 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/3)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 1.0 in stage 221.0 (TID 
342) in 25 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (2/3)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_193_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 2.0 in stage 221.0 (TID 
343) in 25 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (3/3)
   22/11/18 23:35:21 INFO YarnScheduler: Removed TaskSet 221.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:21 INFO DAGScheduler: ResultStage 221 (collect at 
SparkRejectUpdateStrategy.java:52) finished in 0.028 s
   22/11/18 23:35:21 INFO DAGScheduler: Job 168 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:21 INFO YarnScheduler: Killing all running tasks in stage 
221: Stage finished
   22/11/18 23:35:21 INFO DAGScheduler: Job 168 finished: collect at 
SparkRejectUpdateStrategy.java:52, took 0.053273 s
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_195_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 3.3 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_195_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 3.3 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_194_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 19.6 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_194_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 19.6 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_192_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 35.5 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_192_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_196_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 19.7 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Removed broadcast_196_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 19.7 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO SparkContext: Starting job: collectAsMap at 
UpsertPartitioner.java:256
   22/11/18 23:35:21 INFO DAGScheduler: Got job 169 (collectAsMap at 
UpsertPartitioner.java:256) with 1 output partitions
   22/11/18 23:35:21 INFO DAGScheduler: Final stage: ResultStage 222 
(collectAsMap at UpsertPartitioner.java:256)
   22/11/18 23:35:21 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:21 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:21 INFO DAGScheduler: Submitting ResultStage 222 
(MapPartitionsRDD[583] at mapToPair at UpsertPartitioner.java:255), which has 
no missing parents
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_198 stored as values in 
memory (estimated size 314.3 KiB, free 1048.5 MiB)
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_198_piece0 stored as 
bytes in memory (estimated size 116.2 KiB, free 1048.4 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_198_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 116.2 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:21 INFO SparkContext: Created broadcast 198 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:21 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 222 (MapPartitionsRDD[583] at mapToPair at 
UpsertPartitioner.java:255) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:21 INFO YarnScheduler: Adding task set 222.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 0.0 in stage 222.0 (TID 
344) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4344 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_198_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 116.2 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 0.0 in stage 222.0 (TID 
344) in 419 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) 
(1/1)
   22/11/18 23:35:21 INFO YarnScheduler: Removed TaskSet 222.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:21 INFO DAGScheduler: ResultStage 222 (collectAsMap at 
UpsertPartitioner.java:256) finished in 0.440 s
   22/11/18 23:35:21 INFO DAGScheduler: Job 169 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:21 INFO YarnScheduler: Killing all running tasks in stage 
222: Stage finished
   22/11/18 23:35:21 INFO DAGScheduler: Job 169 finished: collectAsMap at 
UpsertPartitioner.java:256, took 0.440639 s
   22/11/18 23:35:21 INFO SparkContext: Starting job: isEmpty at 
HoodieSparkSqlWriter.scala:657
   22/11/18 23:35:21 INFO DAGScheduler: Registering RDD 584 (mapToPair at 
BaseSparkCommitActionExecutor.java:227) as input to shuffle 29
   22/11/18 23:35:21 INFO DAGScheduler: Got job 170 (isEmpty at 
HoodieSparkSqlWriter.scala:657) with 1 output partitions
   22/11/18 23:35:21 INFO DAGScheduler: Final stage: ResultStage 224 (isEmpty 
at HoodieSparkSqlWriter.scala:657)
   22/11/18 23:35:21 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 223)
   22/11/18 23:35:21 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 
223)
   22/11/18 23:35:21 INFO DAGScheduler: Submitting ShuffleMapStage 223 
(MapPartitionsRDD[584] at mapToPair at BaseSparkCommitActionExecutor.java:227), 
which has no missing parents
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_199 stored as values in 
memory (estimated size 349.0 KiB, free 1048.0 MiB)
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_199_piece0 stored as 
bytes in memory (estimated size 130.1 KiB, free 1047.9 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_199_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 130.1 KiB, 
free: 1048.6 MiB)
   22/11/18 23:35:21 INFO SparkContext: Created broadcast 199 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:21 INFO DAGScheduler: Submitting 3 missing tasks from 
ShuffleMapStage 223 (MapPartitionsRDD[584] at mapToPair at 
BaseSparkCommitActionExecutor.java:227) (first 15 tasks are for partitions 
Vector(0, 1, 2))
   22/11/18 23:35:21 INFO YarnScheduler: Adding task set 223.0 with 3 tasks 
resource profile 0
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 0.0 in stage 223.0 (TID 
345) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 1.0 in stage 223.0 (TID 
346) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 1, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 2.0 in stage 223.0 (TID 
347) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 2, 
PROCESS_LOCAL, 4592 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_199_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 130.1 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 2.0 in stage 223.0 (TID 
347) in 37 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/3)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 1.0 in stage 223.0 (TID 
346) in 38 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (2/3)
   22/11/18 23:35:21 INFO TaskSetManager: Finished task 0.0 in stage 223.0 (TID 
345) in 41 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (3/3)
   22/11/18 23:35:21 INFO YarnScheduler: Removed TaskSet 223.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:21 INFO DAGScheduler: ShuffleMapStage 223 (mapToPair at 
BaseSparkCommitActionExecutor.java:227) finished in 0.063 s
   22/11/18 23:35:21 INFO DAGScheduler: looking for newly runnable stages
   22/11/18 23:35:21 INFO DAGScheduler: running: Set()
   22/11/18 23:35:21 INFO DAGScheduler: waiting: Set(ResultStage 224)
   22/11/18 23:35:21 INFO DAGScheduler: failed: Set()
   22/11/18 23:35:21 INFO DAGScheduler: Submitting ResultStage 224 
(MapPartitionsRDD[589] at filter at HoodieSparkSqlWriter.scala:657), which has 
no missing parents
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_200 stored as values in 
memory (estimated size 456.3 KiB, free 1047.5 MiB)
   22/11/18 23:35:21 INFO MemoryStore: Block broadcast_200_piece0 stored as 
bytes in memory (estimated size 171.1 KiB, free 1047.3 MiB)
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_200_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 171.1 KiB, 
free: 1048.4 MiB)
   22/11/18 23:35:21 INFO SparkContext: Created broadcast 200 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:21 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 224 (MapPartitionsRDD[589] at filter at 
HoodieSparkSqlWriter.scala:657) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:21 INFO YarnScheduler: Adding task set 224.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:21 INFO TaskSetManager: Starting task 0.0 in stage 224.0 (TID 
348) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
NODE_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:21 INFO BlockManagerInfo: Added broadcast_200_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 171.1 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:21 INFO MapOutputTrackerMasterEndpoint: Asked to send map 
output locations for shuffle 29 to 172.31.22.163:53582
   22/11/18 23:35:23 INFO BlockManagerInfo: Added rdd_588_0 in memory on 
ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 367.0 B, free: 10.9 
GiB)
   22/11/18 23:35:23 INFO TaskSetManager: Finished task 0.0 in stage 224.0 (TID 
348) in 1911 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) 
(1/1)
   22/11/18 23:35:23 INFO YarnScheduler: Removed TaskSet 224.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:23 INFO DAGScheduler: ResultStage 224 (isEmpty at 
HoodieSparkSqlWriter.scala:657) finished in 1.938 s
   22/11/18 23:35:23 INFO DAGScheduler: Job 170 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:23 INFO YarnScheduler: Killing all running tasks in stage 
224: Stage finished
   22/11/18 23:35:23 INFO DAGScheduler: Job 170 finished: isEmpty at 
HoodieSparkSqlWriter.scala:657, took 2.003411 s
   22/11/18 23:35:23 INFO SparkContext: Starting job: collect at 
SparkRDDWriteClient.java:124
   22/11/18 23:35:23 INFO DAGScheduler: Got job 171 (collect at 
SparkRDDWriteClient.java:124) with 1 output partitions
   22/11/18 23:35:23 INFO DAGScheduler: Final stage: ResultStage 226 (collect 
at SparkRDDWriteClient.java:124)
   22/11/18 23:35:23 INFO DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 225)
   22/11/18 23:35:23 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:23 INFO DAGScheduler: Submitting ResultStage 226 
(MapPartitionsRDD[590] at map at SparkRDDWriteClient.java:124), which has no 
missing parents
   22/11/18 23:35:23 INFO MemoryStore: Block broadcast_201 stored as values in 
memory (estimated size 456.8 KiB, free 1046.8 MiB)
   22/11/18 23:35:23 INFO MemoryStore: Block broadcast_201_piece0 stored as 
bytes in memory (estimated size 171.3 KiB, free 1046.7 MiB)
   22/11/18 23:35:23 INFO BlockManagerInfo: Added broadcast_201_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 171.3 KiB, 
free: 1048.2 MiB)
   22/11/18 23:35:23 INFO SparkContext: Created broadcast 201 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:23 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 226 (MapPartitionsRDD[590] at map at SparkRDDWriteClient.java:124) 
(first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:23 INFO YarnScheduler: Adding task set 226.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:23 INFO TaskSetManager: Starting task 0.0 in stage 226.0 (TID 
349) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
   22/11/18 23:35:23 INFO BlockManagerInfo: Added broadcast_201_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 171.3 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:23 INFO TaskSetManager: Finished task 0.0 in stage 226.0 (TID 
349) in 30 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:23 INFO YarnScheduler: Removed TaskSet 226.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:23 INFO DAGScheduler: ResultStage 226 (collect at 
SparkRDDWriteClient.java:124) finished in 0.058 s
   22/11/18 23:35:23 INFO DAGScheduler: Job 171 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:23 INFO YarnScheduler: Killing all running tasks in stage 
226: Stage finished
   22/11/18 23:35:23 INFO DAGScheduler: Job 171 finished: collect at 
SparkRDDWriteClient.java:124, took 0.059674 s
   22/11/18 23:35:24 INFO SparkContext: Starting job: collectAsMap at 
HoodieSparkEngineContext.java:148
   22/11/18 23:35:24 INFO DAGScheduler: Got job 172 (collectAsMap at 
HoodieSparkEngineContext.java:148) with 2 output partitions
   22/11/18 23:35:24 INFO DAGScheduler: Final stage: ResultStage 227 
(collectAsMap at HoodieSparkEngineContext.java:148)
   22/11/18 23:35:24 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:24 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:24 INFO DAGScheduler: Submitting ResultStage 227 
(MapPartitionsRDD[592] at mapToPair at HoodieSparkEngineContext.java:145), 
which has no missing parents
   22/11/18 23:35:24 INFO MemoryStore: Block broadcast_202 stored as values in 
memory (estimated size 106.2 KiB, free 1046.6 MiB)
   22/11/18 23:35:24 INFO MemoryStore: Block broadcast_202_piece0 stored as 
bytes in memory (estimated size 39.7 KiB, free 1046.5 MiB)
   22/11/18 23:35:24 INFO BlockManagerInfo: Added broadcast_202_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 39.7 KiB, 
free: 1048.2 MiB)
   22/11/18 23:35:24 INFO SparkContext: Created broadcast 202 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:24 INFO DAGScheduler: Submitting 2 missing tasks from 
ResultStage 227 (MapPartitionsRDD[592] at mapToPair at 
HoodieSparkEngineContext.java:145) (first 15 tasks are for partitions Vector(0, 
1))
   22/11/18 23:35:24 INFO YarnScheduler: Adding task set 227.0 with 2 tasks 
resource profile 0
   22/11/18 23:35:24 INFO TaskSetManager: Starting task 0.0 in stage 227.0 (TID 
350) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4437 bytes) taskResourceAssignments Map()
   22/11/18 23:35:24 INFO TaskSetManager: Starting task 1.0 in stage 227.0 (TID 
351) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 1, 
PROCESS_LOCAL, 4433 bytes) taskResourceAssignments Map()
   22/11/18 23:35:24 INFO BlockManagerInfo: Added broadcast_202_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 39.7 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:24 INFO TaskSetManager: Finished task 0.0 in stage 227.0 (TID 
350) in 240 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) 
(1/2)
   22/11/18 23:35:24 INFO TaskSetManager: Finished task 1.0 in stage 227.0 (TID 
351) in 240 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) 
(2/2)
   22/11/18 23:35:24 INFO YarnScheduler: Removed TaskSet 227.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:24 INFO DAGScheduler: ResultStage 227 (collectAsMap at 
HoodieSparkEngineContext.java:148) finished in 0.249 s
   22/11/18 23:35:24 INFO DAGScheduler: Job 172 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:24 INFO YarnScheduler: Killing all running tasks in stage 
227: Stage finished
   22/11/18 23:35:24 INFO DAGScheduler: Job 172 finished: collectAsMap at 
HoodieSparkEngineContext.java:148, took 0.250474 s
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_202_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 39.7 KiB, 
free: 1048.2 MiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_202_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 39.7 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_197_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 3.6 KiB, 
free: 1048.2 MiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_197_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 3.6 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_198_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 116.2 KiB, 
free: 1048.3 MiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_198_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 116.2 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_199_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 130.1 KiB, 
free: 1048.5 MiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_199_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 130.1 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_200_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 171.1 KiB, 
free: 1048.6 MiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_200_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 171.1 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_201_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 171.3 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:25 INFO BlockManagerInfo: Removed broadcast_201_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 171.3 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:25 INFO MapPartitionsRDD: Removing RDD 588 from persistence 
list
   22/11/18 23:35:25 INFO MapPartitionsRDD: Removing RDD 573 from persistence 
list
   22/11/18 23:35:25 INFO BlockManager: Removing RDD 588
   22/11/18 23:35:25 INFO BlockManager: Removing RDD 573
   22/11/18 23:35:25 INFO metastore: Trying to connect to metastore with URI 
thrift://hive-writable-metastore.prod.branch.io:9083
   22/11/18 23:35:25 INFO metastore: Opened a connection to metastore, current 
connections: 1
   22/11/18 23:35:25 INFO metastore: Connected to metastore.
   22/11/18 23:35:26 INFO metastore: Closed a connection to metastore, current 
connections: 0
   22/11/18 23:35:26 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:26 INFO DAGScheduler: Got job 173 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:26 INFO DAGScheduler: Final stage: ResultStage 228 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:26 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:26 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:26 INFO DAGScheduler: Submitting ResultStage 228 
(MapPartitionsRDD[594] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:26 INFO MemoryStore: Block broadcast_203 stored as values in 
memory (estimated size 96.2 KiB, free 1048.7 MiB)
   22/11/18 23:35:26 INFO MemoryStore: Block broadcast_203_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.7 MiB)
   22/11/18 23:35:26 INFO BlockManagerInfo: Added broadcast_203_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:26 INFO SparkContext: Created broadcast 203 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:26 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 228 (MapPartitionsRDD[594] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:26 INFO YarnScheduler: Adding task set 228.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:26 INFO TaskSetManager: Starting task 0.0 in stage 228.0 (TID 
352) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4417 bytes) taskResourceAssignments Map()
   22/11/18 23:35:26 INFO BlockManagerInfo: Added broadcast_203_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:26 INFO TaskSetManager: Finished task 0.0 in stage 228.0 (TID 
352) in 104 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) 
(1/1)
   22/11/18 23:35:26 INFO YarnScheduler: Removed TaskSet 228.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:26 INFO DAGScheduler: ResultStage 228 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.113 s
   22/11/18 23:35:26 INFO DAGScheduler: Job 173 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:26 INFO YarnScheduler: Killing all running tasks in stage 
228: Stage finished
   22/11/18 23:35:26 INFO DAGScheduler: Job 173 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.113273 s
   22/11/18 23:35:26 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:26 INFO DAGScheduler: Got job 174 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:26 INFO DAGScheduler: Final stage: ResultStage 229 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:26 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:26 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:26 INFO DAGScheduler: Submitting ResultStage 229 
(MapPartitionsRDD[596] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:26 INFO MemoryStore: Block broadcast_204 stored as values in 
memory (estimated size 96.2 KiB, free 1048.6 MiB)
   22/11/18 23:35:26 INFO MemoryStore: Block broadcast_204_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.5 MiB)
   22/11/18 23:35:26 INFO BlockManagerInfo: Added broadcast_204_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:26 INFO SparkContext: Created broadcast 204 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:26 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 229 (MapPartitionsRDD[596] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:26 INFO YarnScheduler: Adding task set 229.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:26 INFO TaskSetManager: Starting task 0.0 in stage 229.0 (TID 
353) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4390 bytes) taskResourceAssignments Map()
   22/11/18 23:35:26 INFO BlockManagerInfo: Added broadcast_204_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:26 INFO TaskSetManager: Finished task 0.0 in stage 229.0 (TID 
353) in 79 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:26 INFO YarnScheduler: Removed TaskSet 229.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:26 INFO DAGScheduler: ResultStage 229 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.087 s
   22/11/18 23:35:26 INFO DAGScheduler: Job 174 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:26 INFO YarnScheduler: Killing all running tasks in stage 
229: Stage finished
   22/11/18 23:35:26 INFO DAGScheduler: Job 174 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.088012 s
   22/11/18 23:35:26 INFO Javalin: Stopping Javalin ...
   22/11/18 23:35:26 INFO Javalin: Javalin has stopped
   22/11/18 23:35:26 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:26 INFO DAGScheduler: Got job 175 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:26 INFO DAGScheduler: Final stage: ResultStage 230 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:26 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:26 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:26 INFO DAGScheduler: Submitting ResultStage 230 
(MapPartitionsRDD[598] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:26 INFO BlockManagerInfo: Removed broadcast_204_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 35.5 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:26 INFO BlockManagerInfo: Removed broadcast_204_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:26 INFO BlockManagerInfo: Removed broadcast_203_piece0 on 
ip-172-31-22-180.us-west-1.compute.internal:44351 in memory (size: 35.5 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:26 INFO MemoryStore: Block broadcast_205 stored as values in 
memory (estimated size 96.2 KiB, free 1048.7 MiB)
   22/11/18 23:35:26 INFO BlockManagerInfo: Removed broadcast_203_piece0 on 
ip-172-31-22-163.us-west-1.compute.internal:35253 in memory (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:26 INFO MemoryStore: Block broadcast_205_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.7 MiB)
   22/11/18 23:35:26 INFO BlockManagerInfo: Added broadcast_205_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.8 MiB)
   22/11/18 23:35:26 INFO SparkContext: Created broadcast 205 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:26 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 230 (MapPartitionsRDD[598] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:26 INFO YarnScheduler: Adding task set 230.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:26 INFO TaskSetManager: Starting task 0.0 in stage 230.0 (TID 
354) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4417 bytes) taskResourceAssignments Map()
   22/11/18 23:35:26 INFO BlockManagerInfo: Added broadcast_205_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:26 INFO TaskSetManager: Finished task 0.0 in stage 230.0 (TID 
354) in 73 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:26 INFO YarnScheduler: Removed TaskSet 230.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:26 INFO DAGScheduler: ResultStage 230 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.083 s
   22/11/18 23:35:26 INFO DAGScheduler: Job 175 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:26 INFO YarnScheduler: Killing all running tasks in stage 
230: Stage finished
   22/11/18 23:35:26 INFO DAGScheduler: Job 175 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.096327 s
   22/11/18 23:35:27 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:27 INFO DAGScheduler: Got job 176 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:27 INFO DAGScheduler: Final stage: ResultStage 231 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:27 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:27 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:27 INFO DAGScheduler: Submitting ResultStage 231 
(MapPartitionsRDD[600] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:27 INFO MemoryStore: Block broadcast_206 stored as values in 
memory (estimated size 96.2 KiB, free 1048.6 MiB)
   22/11/18 23:35:27 INFO MemoryStore: Block broadcast_206_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.5 MiB)
   22/11/18 23:35:27 INFO BlockManagerInfo: Added broadcast_206_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:27 INFO SparkContext: Created broadcast 206 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:27 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 231 (MapPartitionsRDD[600] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:27 INFO YarnScheduler: Adding task set 231.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:27 INFO TaskSetManager: Starting task 0.0 in stage 231.0 (TID 
355) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4390 bytes) taskResourceAssignments Map()
   22/11/18 23:35:27 INFO BlockManagerInfo: Added broadcast_206_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:27 INFO TaskSetManager: Finished task 0.0 in stage 231.0 (TID 
355) in 76 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:27 INFO YarnScheduler: Removed TaskSet 231.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:27 INFO DAGScheduler: ResultStage 231 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.084 s
   22/11/18 23:35:27 INFO DAGScheduler: Job 176 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:27 INFO YarnScheduler: Killing all running tasks in stage 
231: Stage finished
   22/11/18 23:35:27 INFO DAGScheduler: Job 176 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.085097 s
   22/11/18 23:35:27 INFO SparkContext: Starting job: collect at 
HoodieSparkEngineContext.java:100
   22/11/18 23:35:27 INFO DAGScheduler: Got job 177 (collect at 
HoodieSparkEngineContext.java:100) with 1 output partitions
   22/11/18 23:35:27 INFO DAGScheduler: Final stage: ResultStage 232 (collect 
at HoodieSparkEngineContext.java:100)
   22/11/18 23:35:27 INFO DAGScheduler: Parents of final stage: List()
   22/11/18 23:35:27 INFO DAGScheduler: Missing parents: List()
   22/11/18 23:35:27 INFO DAGScheduler: Submitting ResultStage 232 
(MapPartitionsRDD[602] at map at HoodieSparkEngineContext.java:100), which has 
no missing parents
   22/11/18 23:35:27 INFO MemoryStore: Block broadcast_207 stored as values in 
memory (estimated size 96.2 KiB, free 1048.4 MiB)
   22/11/18 23:35:27 INFO MemoryStore: Block broadcast_207_piece0 stored as 
bytes in memory (estimated size 35.5 KiB, free 1048.4 MiB)
   22/11/18 23:35:27 INFO BlockManagerInfo: Added broadcast_207_piece0 in 
memory on ip-172-31-22-180.us-west-1.compute.internal:44351 (size: 35.5 KiB, 
free: 1048.7 MiB)
   22/11/18 23:35:27 INFO SparkContext: Created broadcast 207 from broadcast at 
DAGScheduler.scala:1467
   22/11/18 23:35:27 INFO DAGScheduler: Submitting 1 missing tasks from 
ResultStage 232 (MapPartitionsRDD[602] at map at 
HoodieSparkEngineContext.java:100) (first 15 tasks are for partitions Vector(0))
   22/11/18 23:35:27 INFO YarnScheduler: Adding task set 232.0 with 1 tasks 
resource profile 0
   22/11/18 23:35:27 INFO TaskSetManager: Starting task 0.0 in stage 232.0 (TID 
356) (ip-172-31-22-163.us-west-1.compute.internal, executor 22, partition 0, 
PROCESS_LOCAL, 4417 bytes) taskResourceAssignments Map()
   22/11/18 23:35:27 INFO BlockManagerInfo: Added broadcast_207_piece0 in 
memory on ip-172-31-22-163.us-west-1.compute.internal:35253 (size: 35.5 KiB, 
free: 10.9 GiB)
   22/11/18 23:35:27 INFO TaskSetManager: Finished task 0.0 in stage 232.0 (TID 
356) in 75 ms on ip-172-31-22-163.us-west-1.compute.internal (executor 22) (1/1)
   22/11/18 23:35:27 INFO YarnScheduler: Removed TaskSet 232.0, whose tasks 
have all completed, from pool 
   22/11/18 23:35:27 INFO DAGScheduler: ResultStage 232 (collect at 
HoodieSparkEngineContext.java:100) finished in 0.083 s
   22/11/18 23:35:27 INFO DAGScheduler: Job 177 is finished. Cancelling 
potential speculative or zombie tasks for this job
   22/11/18 23:35:27 INFO YarnScheduler: Killing all running tasks in stage 
232: Stage finished
   22/11/18 23:35:27 INFO DAGScheduler: Job 177 finished: collect at 
HoodieSparkEngineContext.java:100, took 0.084346 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to