[ 
https://issues.apache.org/jira/browse/KUDU-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795631#comment-15795631
 ] 

Juan Yu commented on KUDU-1768:
-------------------------------

BTW, It can also crash.
{code}
Log file created at: 2016/12/30 20:16:58
Running on machine: impala4135-2.vpc.cloudera.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1230 20:16:58.748709 30089 tablet_peer_mm_ops.cc:128] Check failed: _s.ok() 
FlushMRS failed on e8b5861a24fb4ada9bf2e219b87b73ef: IO error: Failed to open 
DiskRowSet for flush: Unable to Start() writer for column double_col181[double 
NULLABLE]: Couldn't write header: 
/dataroot/dataroot/kudu/data/data/b0096794642b40c1b0ff7587d5082b7c.data: No 
space left on device (error 28)
{code}

> Tablet server cannot recover after run out of disk space
> --------------------------------------------------------
>
>                 Key: KUDU-1768
>                 URL: https://issues.apache.org/jira/browse/KUDU-1768
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Juan Yu
>         Attachments: tserver.log.tar.gz
>
>
> My node has small disk. after upsert into a large table several times, it ran 
> out of disk space. I dropped the table. all nodes dropped the tablet 
> successfully except one. it stuck in a bad state and tablet data are not 
> removed even after restart.
> For dropped table, it shows 
> {code}
> tmp_store_sales       d88f8a7eba974cba8d9c547f69168c6b        hash buckets: 
> (0)       FAILED (TABLET_DATA_DELETED): Invalid argument: Unable to delete 
> on-disk data from tablet d88f8a7eba974cba8d9c547f69168c6b: The metadata for 
> tablet d88f8a7eba974cba8d9c547f69168c6b still references orphaned blocks. 
> Call DeleteTabletData() first                    Invalid argument: Unable to 
> delete on-disk data from tablet d88f8a7eba974cba8d9c547f69168c6b: The 
> metadata for tablet d88f8a7eba974cba8d9c547f69168c6b still references 
> orphaned blocks. Call DeleteTabletData() first
> {code}
> for other tables, it shows
> {code}
> kudu_store_sales      f539ce77657e431aa5ce4ab7f7375f84        hash buckets: 
> (0, 7)    FAILED (TABLET_DATA_READY): IO error: Failed log replay. Reason: 
> Failed to open new log: Insufficient disk space to allocate 67108864 bytes 
> under path 
> /dataroot/dataroot/tserver/wal/wals/f539ce77657e431aa5ce4ab7f7375f84/.tmp.newsegmentFdOSTp
>  (30613504 bytes free vs 0 bytes reserved) (error 28)                  IO 
> error: Failed log replay. Reason: Failed to open new log: Insufficient disk 
> space to allocate 67108864 bytes under path 
> /dataroot/dataroot/tserver/wal/wals/f539ce77657e431aa5ce4ab7f7375f84/.tmp.newsegmentFdOSTp
>  (30613504 bytes free vs 0 bytes reserved) (error 28)
> {code}
> attached tablet server log from this node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to