I am using CQL 3 to create a table to store images and very image was about 200K ~ 500K. I have 6 harddisks per node and cassandra was configured with 6 data directories:
data_file_directories: > - /data1/cass > - /data2/cass > - /data3/cass > - /data4/cass > - /data5/cass > - /data6/cass > And every directory is on a standalone disk. But I just found when the error occurred: [root@node5 images]# ll -hl > total 3.6T > drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots > -rw-r--r-- 1 root root 456M Apr 30 13:42 > mydb-images-tmp-jb-91068-CompressionInfo.db > -rw-r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db > -rw-r--r-- 1 root root 0 Apr 30 13:42 mydb-images-tmp-jb-91068-Filter.db > -rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp-jb-91068-Index.db > [root@node5 images]# df -hl Filesystem Size Used Avail Use% Mounted on /dev/sda1 49G 7.5G 39G 17% / tmpfs 7.8G 0 7.8G 0% /dev/shm /dev/sda3 3.6T 1.3T 2.1T 38% /data1 /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 /dev/sdc1 3.6T 466G 3.0T 14% /data3 /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 /dev/sde1 3.6T 1.3T 2.2T 38% /data5 /dev/sdf1 3.6T 3.6T 0 100% /data6 *mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk space (4T harddisk with 3.6T actual usable size) after I restated cassandra, very thing seems to be fine: -rw-r--r-- 1 root root 19K Apr 30 13:58 > mydb_oe-images-tmp-jb-96242-CompressionInfo.db > -rw-r--r-- 1 root root 145M Apr 30 13:58 > mydb_oe-images-tmp-jb-96242-Data.db > -rw-r--r-- 1 root root 64K Apr 30 13:58 > mydb_oe-images-tmp-jb-96242-Index.db > [root@node5 images]# df -hl Filesystem Size Used Avail Use% Mounted on /dev/sda1 49G 7.5G 39G 17% / tmpfs 7.8G 0 7.8G 0% /dev/shm /dev/sda3 3.6T 1.3T 2.1T 38% /data1 /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 /dev/sdc1 3.6T 466G 3.0T 14% /data3 /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 /dev/sde1 3.6T 1.3T 2.2T 38% /data5 /dev/sdf1 3.6T 662M 3.4T 1% /data6 So my questions are: 1. I am using CQL3 and is there a limit for 'tables' created by CQL3? 2. I specified 6 data directories with each on a stand alone disk, is it OK? 3. Why the tmp db file is so large? Is it normal or a bug? So could any one please help to solve this issue? Any help is of great appreciation and thanks a lot! On Wed, Apr 30, 2014 at 12:04 PM, Yatong Zhang <bluefl...@gmail.com> wrote: > Thanks for the response. I've checked the system logs and harddisk smartd > info, and no errors found. Any hints to locate the problem? > > > On Wed, Apr 30, 2014 at 9:26 AM, Michael Shuler <mich...@pbandjelly.org>wrote: > >> Then you likely need to fix your I/O problem. The most recent error you >> posted is an EOFException - the file being read ended unexpectedly. >> Probably when you ran out of disk space. >> >> -- >> Michael >> >> >> On 04/29/2014 07:48 PM, Yatong Zhang wrote: >> >>> Here is another type of exception, seems all are I/O related: >>> >>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,548 SSTableReader.java >>> (line >>> >>>> 223) Opening >>>> /data2/cass/system/compaction_history/system-compaction_history-jb-6956 >>>> (447252 bytes) >>>> INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,553 SSTableReader.java >>>> (line 223) Opening >>>> /data2/cass/system/compaction_history/system-compaction_history-jb-6958 >>>> (257 bytes) >>>> INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,554 SSTableReader.java >>>> (line 223) Opening >>>> /data2/cass/system/compaction_history/system-compaction_history-jb-6957 >>>> (257 bytes) >>>> INFO [main] 2014-04-29 14:44:35,592 ColumnFamilyStore.java (line 248) >>>> Initializing system.batchlog >>>> INFO [main] 2014-04-29 14:44:35,596 ColumnFamilyStore.java (line 248) >>>> Initializing system.sstable_activity >>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,601 SSTableReader.java >>>> (line 223) Opening >>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8084 >>>> (1562 >>>> bytes) >>>> INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,604 SSTableReader.java >>>> (line 223) Opening >>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8083 >>>> (2075 >>>> bytes) >>>> INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,605 SSTableReader.java >>>> (line 223) Opening >>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8085 >>>> (1555 >>>> bytes) >>>> INFO [main] 2014-04-29 14:44:35,687 AutoSavingCache.java (line 114) >>>> reading saved cache >>>> /data1/saved_caches/system-sstable_activity-KeyCache-b.db >>>> INFO [main] 2014-04-29 14:44:35,696 ColumnFamilyStore.java (line 248) >>>> Initializing system.peer_events >>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,697 SSTableReader.java >>>> (line 223) Opening /data4/cass/system/peer_ >>>> events/system-peer_events-jb-181 >>>> (12342 bytes) >>>> INFO [main] 2014-04-29 14:44:35,717 ColumnFamilyStore.java (line 248) >>>> Initializing system.compactions_in_progress >>>> INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,718 SSTableReader.java >>>> (line 223) Opening >>>> /data5/cass/system/compactions_in_progress/system-compactions_in_ >>>> progress-jb-36448 >>>> (167 bytes) >>>> ERROR [SSTableBatchOpen:1] 2014-04-29 14:44:35,730 CassandraDaemon.java >>>> (line 198) Exception in thread Thread[SSTableBatchOpen:1,5,main] >>>> org.apache.cassandra.io.sstable.CorruptSSTableException: >>>> java.io.EOFException >>>> at >>>> org.apache.cassandra.io.compress.CompressionMetadata.< >>>> init>(CompressionMetadata.java:110) >>>> at >>>> org.apache.cassandra.io.compress.CompressionMetadata. >>>> create(CompressionMetadata.java:64) >>>> at >>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile >>>> $Builder.complete(CompressedPoolingSegmentedFile.java:42) >>>> at >>>> org.apache.cassandra.io.sstable.SSTableReader.load( >>>> SSTableReader.java:458) >>>> at >>>> org.apache.cassandra.io.sstable.SSTableReader.load( >>>> SSTableReader.java:422) >>>> at >>>> org.apache.cassandra.io.sstable.SSTableReader.open( >>>> SSTableReader.java:203) >>>> at >>>> org.apache.cassandra.io.sstable.SSTableReader.open( >>>> SSTableReader.java:184) >>>> at >>>> org.apache.cassandra.io.sstable.SSTableReader$1.run( >>>> SSTableReader.java:264) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>>> ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>> ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:744) >>>> Caused by: java.io.EOFException >>>> at >>>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >>>> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >>>> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >>>> at >>>> org.apache.cassandra.io.compress.CompressionMetadata.< >>>> init>(CompressionMetadata.java:85) >>>> ... 12 more >>>> INFO [main] 2014-04-29 14:44:35,733 ColumnFamilyStore.java (line 248) >>>> Initializing system.hints >>>> INFO [main] 2014-04-29 14:44:35,734 AutoSavingCache.java (line 114) >>>> reading saved cache /data1/saved_caches/system-hints-KeyCache-b.db >>>> INFO [main] 2014-04-29 14:44:35,737 ColumnFamilyStore.java (line 248) >>>> Initializing system.schema_keyspaces >>>> >>>> >>> >>> >>> On Tue, Apr 29, 2014 at 6:07 PM, Yatong Zhang <bluefl...@gmail.com> >>> wrote: >>> >>> I am pretty sure the disk has plenty of space, I am sure of that. I >>>> restarted cassandra and everything went fine again. >>>> >>>> It's really wired >>>> >>>> >>>> On Tue, Apr 29, 2014 at 5:58 PM, Sylvain Lebresne <sylv...@datastax.com >>>> >wrote: >>>> >>>> The important part of that stack trace is "java.io.IOException: No >>>>> space >>>>> left on device", your disks are full (and it's not really a bug that >>>>> Cassandra error out in that case). >>>>> >>>>> -- >>>>> Sylvain >>>>> >>>>> >>>>> On Tue, Apr 29, 2014 at 11:09 AM, Yatong Zhang <bluefl...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi there, >>>>>> >>>>>> Sorry if this is not the right place to report bugs. I am using 2.0.7 >>>>>> >>>>> and I >>>>> >>>>>> have a 10 boxes clusters with about 200TB capacity. I just found I >>>>>> had 3 >>>>>> boxes with error exceptions. With datastax opscenter I can see these >>>>>> >>>>> three >>>>> >>>>>> nodes lost connections (no reponse), but after I sshed to these >>>>>> server, >>>>>> cassandara were still running, and the 'system.log' still had logs. >>>>>> >>>>>> I think this might be a bug so any one would kindly help to >>>>>> investigate >>>>>> into it? Thanks~ >>>>>> >>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,249 >>>>>> >>>>> CassandraDaemon.java >>>>> >>>>>> (line 198) Exception in thread Thread[CompactionExecutor:1,1,main] >>>>>>> FSWriteError in >>>>>>> >>>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-98219-Filter.db >>>>>> >>>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close( >>>>> SSTableWriter.java:475) >>>>> >>>>>> at >>>>>>> >>>>>>> org.apache.cassandra.io.util.FileUtils.closeQuietly( >>>>> FileUtils.java:212) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.io.sstable.SSTableWriter.abort( >>>>> SSTableWriter.java:301) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.db.compaction.CompactionTask. >>>>> runWith(CompactionTask.java:209) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow( >>>>> DiskAwareRunnable.java:48) >>>>> >>>>>> at >>>>>>> >>>>>>> org.apache.cassandra.utils.WrappedRunnable.run( >>>>> WrappedRunnable.java:28) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal( >>>>> CompactionTask.java:60) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute( >>>>> AbstractCompactionTask.java:59) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.db.compaction.CompactionManager$ >>>>> BackgroundCompactionTask.run(CompactionManager.java:197) >>>>> >>>>>> at >>>>>>> >>>>>>> java.util.concurrent.Executors$RunnableAdapter. >>>>> call(Executors.java:471) >>>>> >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>>> at >>>>>>> >>>>>>> >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>> ThreadPoolExecutor.java:1145) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>> ThreadPoolExecutor.java:615) >>>>> >>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>> Caused by: java.io.IOException: No space left on device >>>>>>> at java.io.FileOutputStream.write(Native Method) >>>>>>> at java.io.FileOutputStream.write(FileOutputStream.java: >>>>>>> 295) >>>>>>> at >>>>>>> >>>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.utils.BloomFilterSerializer.serialize( >>>>> BloomFilterSerializer.java:34) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.utils.Murmur3BloomFilter$ >>>>> Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44) >>>>> >>>>>> at >>>>>>> >>>>>>> org.apache.cassandra.utils.FilterFactory.serialize( >>>>> FilterFactory.java:41) >>>>> >>>>>> at >>>>>>> >>>>>>> >>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close( >>>>> SSTableWriter.java:468) >>>>> >>>>>> ... 13 more >>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,406 >>>>>>> >>>>>> StorageService.java >>>>> >>>>>> (line 367) Stopping gossiper >>>>>>> WARN [CompactionExecutor:1] 2014-04-29 05:55:15,406 >>>>>>> >>>>>> StorageService.java >>>>> >>>>>> (line 281) Stopping gossip by operator request >>>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:15,406 Gossiper.java >>>>>>> >>>>>> (line >>>>> >>>>>> 1271) Announcing shutdown >>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,406 >>>>>>> >>>>>> StorageService.java >>>>> >>>>>> (line 372) Stopping RPC server >>>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:17,406 >>>>>>> ThriftServer.java >>>>>>> (line 141) Stop listening to thrift clients >>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,417 >>>>>>> >>>>>> StorageService.java >>>>> >>>>>> (line 377) Stopping native transport >>>>>>> INFO [CompactionExecutor:1] 2014-04-29 05:55:17,504 Server.java >>>>>>> (line >>>>>>> 181) Stop listening for CQL clients >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >