[ 
https://issues.apache.org/jira/browse/SPARK-27663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836044#comment-16836044
 ] 

Fan Yunbo edited comment on SPARK-27663 at 5/9/19 3:21 AM:
-----------------------------------------------------------

The incomplete task's id is 17.0 in tage 98517.0

!incomplte-task-1.png!

the input size is 23.5 MB, and finished in 1 s  !incomplte-task-2.png!

and the log shows the input split size is about 300 MB
{code:java}
Input split: 
hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763{code}
{code:java}
19/04/23 12:09:18 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 
6835988
19/04/23 12:09:18 INFO executor.Executor: Running task 17.0 in stage 98517.0 
(TID 6835988)
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast 
variable 173456
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456_piece0 stored 
as bytes in memory (estimated size 13.4 KB, free 15.2 GB)
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
173456 took 4 ms
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456 stored as 
values in memory (estimated size 30.3 KB, free 15.2 GB)
19/04/23 12:09:18 INFO rdd.HadoopRDD: Input split: 
hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast 
variable 173452
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452_piece0 stored 
as bytes in memory (estimated size 30.8 KB, free 15.2 GB)
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
173452 took 3 ms
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452 stored as 
values in memory (estimated size 365.1 KB, free 15.3 GB)
19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 6.949728 ms
19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 20.909883 ms
19/04/23 12:09:18 INFO output.FileOutputCommitter: Saved output of task 
'attempt_20190423120856_98508_m_000047_0' to 
hdfs://cqocdc/tmp/.staging/hive_hive_2019-04-23_12-08-56_154_3110404551071203558-1370/-ext-10000/_temporary/0/task_20190423120856_98508_m_000047
19/04/23 12:09:18 INFO mapred.SparkHadoopMapRedUtil: 
attempt_20190423120856_98508_m_000047_0: Committed
19/04/23 12:09:18 INFO executor.Executor: Finished task 47.0 in stage 98508.0 
(TID 6835975). 3217 bytes result sent to driver
19/04/23 12:09:19 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 
TERM
19/04/23 12:09:19 INFO storage.DiskBlockManager: Shutdown hook called
19/04/23 12:09:19 INFO util.ShutdownHookManager: Shutdown hook called
19/04/23 12:09:19 INFO executor.Executor: Finished task 17.0 in stage 98517.0 
(TID 6835988). 3188 bytes result sent to driver
{code}
The file size and last modified time:

!image-2019-05-09-11-10-04-602.png!

The stage of the query total input is 14.9 G:

!incomplte-task-0.png!

 


was (Author: fanyunbojerry):
The incomplete task's id is 17.0 in tage 98517.0

!incomplte-task-1.png!

the input size is 23.5 MB, and finished in 1 s !incomplte-task-2.png!

and the log shows the input split size is
{code:java}
Input split: 
hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763{code}
{code:java}
19/04/23 12:09:18 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 
6835988
19/04/23 12:09:18 INFO executor.Executor: Running task 17.0 in stage 98517.0 
(TID 6835988)
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast 
variable 173456
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456_piece0 stored 
as bytes in memory (estimated size 13.4 KB, free 15.2 GB)
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
173456 took 4 ms
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173456 stored as 
values in memory (estimated size 30.3 KB, free 15.2 GB)
19/04/23 12:09:18 INFO rdd.HadoopRDD: Input split: 
hdfs://cqocdc/user/hive/warehouse/dw_user_useage_privilege_dt_yyyymmdd/month_id=201904/day_id=20190422/000017_0.snappy:0+326992763
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Started reading broadcast 
variable 173452
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452_piece0 stored 
as bytes in memory (estimated size 30.8 KB, free 15.2 GB)
19/04/23 12:09:18 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
173452 took 3 ms
19/04/23 12:09:18 INFO memory.MemoryStore: Block broadcast_173452 stored as 
values in memory (estimated size 365.1 KB, free 15.3 GB)
19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 6.949728 ms
19/04/23 12:09:18 INFO codegen.CodeGenerator: Code generated in 20.909883 ms
19/04/23 12:09:18 INFO output.FileOutputCommitter: Saved output of task 
'attempt_20190423120856_98508_m_000047_0' to 
hdfs://cqocdc/tmp/.staging/hive_hive_2019-04-23_12-08-56_154_3110404551071203558-1370/-ext-10000/_temporary/0/task_20190423120856_98508_m_000047
19/04/23 12:09:18 INFO mapred.SparkHadoopMapRedUtil: 
attempt_20190423120856_98508_m_000047_0: Committed
19/04/23 12:09:18 INFO executor.Executor: Finished task 47.0 in stage 98508.0 
(TID 6835975). 3217 bytes result sent to driver
19/04/23 12:09:19 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 
TERM
19/04/23 12:09:19 INFO storage.DiskBlockManager: Shutdown hook called
19/04/23 12:09:19 INFO util.ShutdownHookManager: Shutdown hook called
19/04/23 12:09:19 INFO executor.Executor: Finished task 17.0 in stage 98517.0 
(TID 6835988). 3188 bytes result sent to driver
{code}
The file size and last modified time:

!image-2019-05-09-11-10-04-602.png!

The stage of the query total input is 14.9 G:

!incomplte-task-0.png!

 

> Task accomplished incompletely but marked as success
> ----------------------------------------------------
>
>                 Key: SPARK-27663
>                 URL: https://issues.apache.org/jira/browse/SPARK-27663
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.1.0
>            Reporter: Fan Yunbo
>            Priority: Major
>         Attachments: image-2019-05-09-11-10-04-602.png, incomplte-task-0.png, 
> incomplte-task-1.png, incomplte-task-2.png, reran-0.png, reran-1.png
>
>
> It happens when running sql queries using spark sql.
> The task was accomplished incompletely but marked as success since there were 
> not any  exceptions and failed or killed tasks.
> When I checked the query result, it missed about 4000 records.
> The history web ui shows that the task input size is 23.5 MB, but the log in 
> the executor shows the split size is 326992763, about 300 MB.
> And this task was finished in 1 second, but others’ duration was about 15 
> seconds.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to