[ 
https://issues.apache.org/jira/browse/KUDU-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884077#comment-16884077
 ] 

Andrew Wong commented on KUDU-2892:
-----------------------------------

Also, [~helifu] it seems like logs you posted are specific for tablet 
2278f736bf6548e2b773003c1ba7ed66. If that's the case, could you attach the full 
logs, including any warnings from the log block manager? That would be helpful 
in getting to the bottom of how this happened as well.

> tserver crashed while dropping range partition
> ----------------------------------------------
>
>                 Key: KUDU-2892
>                 URL: https://issues.apache.org/jira/browse/KUDU-2892
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 1.9.0
>            Reporter: HeLifu
>            Priority: Major
>         Attachments: tserver-INFO.log
>
>
> On one of our production clusters, a tserver crashed yesterday morning while 
> dropping a range partition, and below is error-msg:
> {code:java}
> // code placeholder
> Log file created at: 2019/07/11 01:51:30
> Running on machine: kudu31.jd.163.org
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> E0711 01:51:30.331185 11840 env_posix.cc:316] I/O error, context: 
> /mnt/dfs/0/kudu/tserver/data/data/9305dce18e6f4100b486b605617122b3.data
> E0711 01:51:30.337604 11840 data_dirs.cc:1120] Directory 
> /mnt/dfs/0/kudu/tserver/data/data marked as failed
> F0711 04:00:51.835958 68948 ts_tablet_manager.cc:940] Failed to delete tablet 
> data for 2278f736bf6548e2b773003c1ba7ed66: Invalid argument: Unable to delete 
> on-disk data from tablet 2278f736bf6548e2b773003c1ba7ed66: The metadata for 
> tablet 2278f736bf6548e2b773003c1ba7ed66 still references orphaned blocks. 
> Call DeleteTabletData() first
> {code}
> It seems the new orphan blocks that were not deleted caused this problem 
> after a disk was marked as bad. I attached an info-msg file about tablet 
> '2278f736bf6548e2b773003c1ba7ed66'. Our kudu version is 1.9.x 6a9cf4.
> For brevity, I made a quick generalization:
>  # 01:51:30.331185: bad disk /mnt/dfs/0 was detected
>  # 01:51:30.344581: failing tablet
>  # 01:51:30.870059: Initiating tablet copy
>  # 04:00:51.820354: Processing DeleteTablet
>  # 04:00:51.835958: Crashed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to