[ https://issues.apache.org/jira/browse/SPARK-27663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836044#comment-16836044 ]
Fan Yunbo edited comment on SPARK-27663 at 5/9/19 3:21 AM: ----------------------------------------------------------- The incomplete task's id is 17.0 in tage 98517.0 !incomplte-task-1.png! the input size is 23.5 MB, and finished in 1 s !incomplte-task-2.png! and the log shows the input split size is about 300 MB {code:java} Input split: hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763{code} {code:java} 19/04/23 12:09:18 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 6835988 19/04/23 12:09:18 INFO executor.Executor: Running task 17.0 in stage 98517.0 (TID 6835988) 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 173456 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456_piece0 stored as bytes in memory (estimated size 13.4 KB, free 15.2 GB) 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 173456 took 4 ms 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456 stored as values in memory (estimated size 30.3 KB, free 15.2 GB) 19/04/23 12:09:18 INFO rdd.HadoopRDD: Input split: hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 173452 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452_piece0 stored as bytes in memory (estimated size 30.8 KB, free 15.2 GB) 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 173452 took 3 ms 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452 stored as values in memory (estimated size 365.1 KB, free 15.3 GB) 19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 6.949728 ms 19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 20.909883 ms 19/04/23 12:09:18 INFO output.FileOutputCommitter: Saved output of task 'attempt_20190423120856_98508_m_000047_0' to hdfs://cqocdc/tmp/.staging/hive_hive_2019-04-23_12-08-56_154_3110404551071203558-1370/-ext-10000/_temporary/0/task_20190423120856_98508_m_000047 19/04/23 12:09:18 INFO mapred.SparkHadoopMapRedUtil: attempt_20190423120856_98508_m_000047_0: Committed 19/04/23 12:09:18 INFO executor.Executor: Finished task 47.0 in stage 98508.0 (TID 6835975). 3217 bytes result sent to driver 19/04/23 12:09:19 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 19/04/23 12:09:19 INFO storage.DiskBlockManager: Shutdown hook called 19/04/23 12:09:19 INFO util.ShutdownHookManager: Shutdown hook called 19/04/23 12:09:19 INFO executor.Executor: Finished task 17.0 in stage 98517.0 (TID 6835988). 3188 bytes result sent to driver {code} The file size and last modified time: !image-2019-05-09-11-10-04-602.png! The stage of the query total input is 14.9 G: !incomplte-task-0.png! was (Author: fanyunbojerry): The incomplete task's id is 17.0 in tage 98517.0 !incomplte-task-1.png! the input size is 23.5 MB, and finished in 1 s !incomplte-task-2.png! and the log shows the input split size is {code:java} Input split: hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763{code} {code:java} 19/04/23 12:09:18 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 6835988 19/04/23 12:09:18 INFO executor.Executor: Running task 17.0 in stage 98517.0 (TID 6835988) 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 173456 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456_piece0 stored as bytes in memory (estimated size 13.4 KB, free 15.2 GB) 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 173456 took 4 ms 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456 stored as values in memory (estimated size 30.3 KB, free 15.2 GB) 19/04/23 12:09:18 INFO rdd.HadoopRDD: Input split: hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 173452 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452_piece0 stored as bytes in memory (estimated size 30.8 KB, free 15.2 GB) 19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 173452 took 3 ms 19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452 stored as values in memory (estimated size 365.1 KB, free 15.3 GB) 19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 6.949728 ms 19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 20.909883 ms 19/04/23 12:09:18 INFO output.FileOutputCommitter: Saved output of task 'attempt_20190423120856_98508_m_000047_0' to hdfs://cqocdc/tmp/.staging/hive_hive_2019-04-23_12-08-56_154_3110404551071203558-1370/-ext-10000/_temporary/0/task_20190423120856_98508_m_000047 19/04/23 12:09:18 INFO mapred.SparkHadoopMapRedUtil: attempt_20190423120856_98508_m_000047_0: Committed 19/04/23 12:09:18 INFO executor.Executor: Finished task 47.0 in stage 98508.0 (TID 6835975). 3217 bytes result sent to driver 19/04/23 12:09:19 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 19/04/23 12:09:19 INFO storage.DiskBlockManager: Shutdown hook called 19/04/23 12:09:19 INFO util.ShutdownHookManager: Shutdown hook called 19/04/23 12:09:19 INFO executor.Executor: Finished task 17.0 in stage 98517.0 (TID 6835988). 3188 bytes result sent to driver {code} The file size and last modified time: !image-2019-05-09-11-10-04-602.png! The stage of the query total input is 14.9 G: !incomplte-task-0.png! > Task accomplished incompletely but marked as success > ---------------------------------------------------- > > Key: SPARK-27663 > URL: https://issues.apache.org/jira/browse/SPARK-27663 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.1.0 > Reporter: Fan Yunbo > Priority: Major > Attachments: image-2019-05-09-11-10-04-602.png, incomplte-task-0.png, > incomplte-task-1.png, incomplte-task-2.png, reran-0.png, reran-1.png > > > It happens when running sql queries using spark sql. > The task was accomplished incompletely but marked as success since there were > not any exceptions and failed or killed tasks. > When I checked the query result, it missed about 4000 records. > The history web ui shows that the task input size is 23.5 MB, but the log in > the executor shows the split size is 326992763, about 300 MB. > And this task was finished in 1 second, but others’ duration was about 15 > seconds. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org