[ 
https://issues.apache.org/jira/browse/CASSANDRA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller resolved CASSANDRA-1597.
---------------------------------------

    Resolution: Invalid

> cassandra start-up seek-bound in uninterruptable sleep
> ------------------------------------------------------
>
>                 Key: CASSANDRA-1597
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1597
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Priority: Minor
>
> (Cassandra trunk from a few months ago, Linux 2.6.32)
> I just restarted a Cassandra node with a large (a few TB:s) column family. I 
> was expecting index sampling to take a while, but in this case I'm seeing 
> something unexpected. First, the log has:
> 10/10/09 11:15:49 INFO config.DatabaseDescriptor: Auto DiskAccessMode 
> determined to be mmap
> 10/10/09 11:15:50 INFO sstable.SSTableReader: Sampling index for 
> /var/lib/cassandra/data/system/Schema-c-1-<>
> 10/10/09 11:15:50 INFO sstable.SSTableReader: Sampling index for 
> /var/lib/cassandra/data/system/Migrations-c-1-<>
> 10/10/09 11:15:50 INFO sstable.SSTableReader: Sampling index for 
> /var/lib/cassandra/data/system/LocationInfo-c-1-<>
> 10/10/09 11:15:50 INFO config.DatabaseDescriptor: Loading schema version 
> d8ad1485-a6e4-11df-beb1-d1273d6ae3d0
> And it is currently stuck with iostat looking like this:
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sdb               0.00    67.00  146.00    2.00   584.00   276.00    11.62    
>  0.99    6.68   6.68  98.80
> I re-checked the code and the index sampling really should not be seek bound 
> since it is sequential. At first I figured that perhaps some aspect of the 
> I/O pattern is causing the kernel (and I guess underlying RAID+drives) to not 
> do read-ahead. This could explain why lots of small reads, even though they 
> are sequential in nature, could end up being seek bound.
> mmap() has the problem that you cannot strace to see what is doing I/O, so I 
> wanted to confirm with jstack. At this point I notice jstack won't attach at 
> all (after a timeout).
> Further, a SIGKILL fails to kill Java, with iostat proceeding, still seek 
> bound, and the process in uninterruptable sleep (presumably constantly, or 
> else the SIGKILL should have worked).
> Somewhere along the way, strace has begun failing to attach as well but from 
> my remembrance I didn't see any I/O calls when it did attach properly, which 
> is consistent with whatever I/O is happening being triggered by mmap():ed 
> memory access or else by some other long-running syscall.
> A very speculative hypothesis is that perhaps the initial mmap() call can be 
> long and blocking, seek-bound, due to inodes being consulted on disk (compare 
> with the cost of "rm:ing" large files on ext3fs). This would probably have 
> the potential to take quite a long time, if it is the case. On the other 
> hand, if this was the behavior of mmap() on ext3fs, it seems unlikely that it 
> would not be widely known already. It also seems unlikely in terms of prior 
> probability since I don't see a good reason why mmap() would have a need to 
> scan inodes.
> I am currently waiting it out to see if/when it eventually gets killed.
> I will then try restarting the node in standard I/O mode and see if that 
> works. If it does, I'll re-try with mmap():ed mode to see that I can trigger 
> it again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to