Re: bugs report for 2.0.7

Yatong Zhang Tue, 29 Apr 2014 23:11:31 -0700

I am using CQL 3 to create a table to store images and very image was about
200K ~ 500K. I have 6 harddisks per node and cassandra was configured with
6 data directories:


data_file_directories:
>     - /data1/cass
>     - /data2/cass
>     - /data3/cass
>     - /data4/cass
>     - /data5/cass
>     - /data6/cass
>

And every directory is on a standalone disk. But I just found when the
error occurred:

[root@node5 images]# ll -hl
> total 3.6T
> drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots
> -rw-r--r-- 1 root root 456M Apr 30 13:42
> mydb-images-tmp-jb-91068-CompressionInfo.db
> -rw-r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db
> -rw-r--r-- 1 root root    0 Apr 30 13:42 mydb-images-tmp-jb-91068-Filter.db
> -rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp-jb-91068-Index.db
>

[root@node5 images]# df -hl
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G  7.5G   39G  17% /
tmpfs           7.8G     0  7.8G   0% /dev/shm
/dev/sda3       3.6T  1.3T  2.1T  38% /data1
/dev/sdb1       3.6T  1.4T  2.1T  39% /data2
/dev/sdc1       3.6T  466G  3.0T  14% /data3
/dev/sdd1       3.6T  1.3T  2.2T  38% /data4
/dev/sde1       3.6T  1.3T  2.2T  38% /data5
/dev/sdf1       3.6T  3.6T     0 100% /data6

*mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk space (4T
harddisk with 3.6T actual usable size)

after I restated cassandra, very thing seems to be fine:

-rw-r--r-- 1 root root  19K Apr 30 13:58
> mydb_oe-images-tmp-jb-96242-CompressionInfo.db
> -rw-r--r-- 1 root root 145M Apr 30 13:58
> mydb_oe-images-tmp-jb-96242-Data.db
> -rw-r--r-- 1 root root  64K Apr 30 13:58
> mydb_oe-images-tmp-jb-96242-Index.db
>

[root@node5 images]# df -hl
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G  7.5G   39G  17% /
tmpfs           7.8G     0  7.8G   0% /dev/shm
/dev/sda3       3.6T  1.3T  2.1T  38% /data1
/dev/sdb1       3.6T  1.4T  2.1T  39% /data2
/dev/sdc1       3.6T  466G  3.0T  14% /data3
/dev/sdd1       3.6T  1.3T  2.2T  38% /data4
/dev/sde1       3.6T  1.3T  2.2T  38% /data5
/dev/sdf1       3.6T  662M  3.4T   1% /data6

So my questions are:

1. I am using CQL3 and is there a limit for 'tables' created by CQL3?
2. I specified 6 data directories with each on a stand alone disk, is it OK?
3. Why the tmp db file is so large? Is it normal or a bug?


So could any one please help to solve this issue? Any help is of great
appreciation and thanks a lot!


On Wed, Apr 30, 2014 at 12:04 PM, Yatong Zhang <bluefl...@gmail.com> wrote:

> Thanks for the response. I've checked the system logs and harddisk smartd
> info, and no errors found. Any hints to locate the problem?
>
>
> On Wed, Apr 30, 2014 at 9:26 AM, Michael Shuler <mich...@pbandjelly.org>wrote:
>
>> Then you likely need to fix your I/O problem. The most recent error you
>> posted is an EOFException - the file being read ended unexpectedly.
>> Probably when you ran out of disk space.
>>
>> --
>> Michael
>>
>>
>> On 04/29/2014 07:48 PM, Yatong Zhang wrote:
>>
>>> Here is another type of exception, seems all are I/O related:
>>>
>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,548 SSTableReader.java
>>> (line
>>>
>>>> 223) Opening
>>>> /data2/cass/system/compaction_history/system-compaction_history-jb-6956
>>>> (447252 bytes)
>>>>   INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,553 SSTableReader.java
>>>> (line 223) Opening
>>>> /data2/cass/system/compaction_history/system-compaction_history-jb-6958
>>>> (257 bytes)
>>>>   INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,554 SSTableReader.java
>>>> (line 223) Opening
>>>> /data2/cass/system/compaction_history/system-compaction_history-jb-6957
>>>> (257 bytes)
>>>>   INFO [main] 2014-04-29 14:44:35,592 ColumnFamilyStore.java (line 248)
>>>> Initializing system.batchlog
>>>>   INFO [main] 2014-04-29 14:44:35,596 ColumnFamilyStore.java (line 248)
>>>> Initializing system.sstable_activity
>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,601 SSTableReader.java
>>>> (line 223) Opening
>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8084
>>>> (1562
>>>> bytes)
>>>>   INFO [SSTableBatchOpen:2] 2014-04-29 14:44:35,604 SSTableReader.java
>>>> (line 223) Opening
>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8083
>>>> (2075
>>>> bytes)
>>>>   INFO [SSTableBatchOpen:3] 2014-04-29 14:44:35,605 SSTableReader.java
>>>> (line 223) Opening
>>>> /data2/cass/system/sstable_activity/system-sstable_activity-jb-8085
>>>> (1555
>>>> bytes)
>>>>   INFO [main] 2014-04-29 14:44:35,687 AutoSavingCache.java (line 114)
>>>> reading saved cache
>>>> /data1/saved_caches/system-sstable_activity-KeyCache-b.db
>>>>   INFO [main] 2014-04-29 14:44:35,696 ColumnFamilyStore.java (line 248)
>>>> Initializing system.peer_events
>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,697 SSTableReader.java
>>>> (line 223) Opening /data4/cass/system/peer_
>>>> events/system-peer_events-jb-181
>>>> (12342 bytes)
>>>>   INFO [main] 2014-04-29 14:44:35,717 ColumnFamilyStore.java (line 248)
>>>> Initializing system.compactions_in_progress
>>>>   INFO [SSTableBatchOpen:1] 2014-04-29 14:44:35,718 SSTableReader.java
>>>> (line 223) Opening
>>>> /data5/cass/system/compactions_in_progress/system-compactions_in_
>>>> progress-jb-36448
>>>> (167 bytes)
>>>> ERROR [SSTableBatchOpen:1] 2014-04-29 14:44:35,730 CassandraDaemon.java
>>>> (line 198) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>>>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>>>> java.io.EOFException
>>>>          at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.<
>>>> init>(CompressionMetadata.java:110)
>>>>          at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.
>>>> create(CompressionMetadata.java:64)
>>>>          at
>>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile
>>>> $Builder.complete(CompressedPoolingSegmentedFile.java:42)
>>>>          at
>>>> org.apache.cassandra.io.sstable.SSTableReader.load(
>>>> SSTableReader.java:458)
>>>>          at
>>>> org.apache.cassandra.io.sstable.SSTableReader.load(
>>>> SSTableReader.java:422)
>>>>          at
>>>> org.apache.cassandra.io.sstable.SSTableReader.open(
>>>> SSTableReader.java:203)
>>>>          at
>>>> org.apache.cassandra.io.sstable.SSTableReader.open(
>>>> SSTableReader.java:184)
>>>>          at
>>>> org.apache.cassandra.io.sstable.SSTableReader$1.run(
>>>> SSTableReader.java:264)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1145)
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:615)
>>>>          at java.lang.Thread.run(Thread.java:744)
>>>> Caused by: java.io.EOFException
>>>>          at
>>>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>>>          at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>>>          at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>>>          at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.<
>>>> init>(CompressionMetadata.java:85)
>>>>          ... 12 more
>>>>   INFO [main] 2014-04-29 14:44:35,733 ColumnFamilyStore.java (line 248)
>>>> Initializing system.hints
>>>>   INFO [main] 2014-04-29 14:44:35,734 AutoSavingCache.java (line 114)
>>>> reading saved cache /data1/saved_caches/system-hints-KeyCache-b.db
>>>>   INFO [main] 2014-04-29 14:44:35,737 ColumnFamilyStore.java (line 248)
>>>> Initializing system.schema_keyspaces
>>>>
>>>>
>>>
>>>
>>> On Tue, Apr 29, 2014 at 6:07 PM, Yatong Zhang <bluefl...@gmail.com>
>>> wrote:
>>>
>>>  I am pretty sure the disk has plenty of space, I am sure of that. I
>>>> restarted cassandra and everything went fine again.
>>>>
>>>> It's really wired
>>>>
>>>>
>>>> On Tue, Apr 29, 2014 at 5:58 PM, Sylvain Lebresne <sylv...@datastax.com
>>>> >wrote:
>>>>
>>>>  The important part of that stack trace is "java.io.IOException: No
>>>>> space
>>>>> left on device", your disks are full (and it's not really a bug that
>>>>> Cassandra error out in that case).
>>>>>
>>>>> --
>>>>> Sylvain
>>>>>
>>>>>
>>>>> On Tue, Apr 29, 2014 at 11:09 AM, Yatong Zhang <bluefl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  Hi there,
>>>>>>
>>>>>> Sorry if this is not the right place to report bugs. I am using 2.0.7
>>>>>>
>>>>> and I
>>>>>
>>>>>> have a 10 boxes clusters with about 200TB capacity. I just found I
>>>>>> had 3
>>>>>> boxes with error exceptions. With datastax opscenter I can see these
>>>>>>
>>>>> three
>>>>>
>>>>>> nodes lost connections (no reponse), but after I sshed to these
>>>>>> server,
>>>>>> cassandara were still running, and the 'system.log' still had logs.
>>>>>>
>>>>>> I think this might be a bug so any one would kindly help to
>>>>>> investigate
>>>>>> into it? Thanks~
>>>>>>
>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,249
>>>>>>
>>>>> CassandraDaemon.java
>>>>>
>>>>>> (line 198) Exception in thread Thread[CompactionExecutor:1,1,main]
>>>>>>> FSWriteError in
>>>>>>>
>>>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-98219-Filter.db
>>>>>>
>>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(
>>>>> SSTableWriter.java:475)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>  org.apache.cassandra.io.util.FileUtils.closeQuietly(
>>>>> FileUtils.java:212)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.io.sstable.SSTableWriter.abort(
>>>>> SSTableWriter.java:301)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.db.compaction.CompactionTask.
>>>>> runWith(CompactionTask.java:209)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(
>>>>> DiskAwareRunnable.java:48)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>  org.apache.cassandra.utils.WrappedRunnable.run(
>>>>> WrappedRunnable.java:28)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.db.compaction.CompactionTask.executeInternal(
>>>>> CompactionTask.java:60)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(
>>>>> AbstractCompactionTask.java:59)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.db.compaction.CompactionManager$
>>>>> BackgroundCompactionTask.run(CompactionManager.java:197)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>  java.util.concurrent.Executors$RunnableAdapter.
>>>>> call(Executors.java:471)
>>>>>
>>>>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> ThreadPoolExecutor.java:1145)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> ThreadPoolExecutor.java:615)
>>>>>
>>>>>>          at java.lang.Thread.run(Thread.java:744)
>>>>>>> Caused by: java.io.IOException: No space left on device
>>>>>>>          at java.io.FileOutputStream.write(Native Method)
>>>>>>>          at java.io.FileOutputStream.write(FileOutputStream.java:
>>>>>>> 295)
>>>>>>>          at
>>>>>>>
>>>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.utils.BloomFilterSerializer.serialize(
>>>>> BloomFilterSerializer.java:34)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.utils.Murmur3BloomFilter$
>>>>> Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>  org.apache.cassandra.utils.FilterFactory.serialize(
>>>>> FilterFactory.java:41)
>>>>>
>>>>>>          at
>>>>>>>
>>>>>>>
>>>>>>  org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(
>>>>> SSTableWriter.java:468)
>>>>>
>>>>>>          ... 13 more
>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:15,406
>>>>>>>
>>>>>> StorageService.java
>>>>>
>>>>>> (line 367) Stopping gossiper
>>>>>>>   WARN [CompactionExecutor:1] 2014-04-29 05:55:15,406
>>>>>>>
>>>>>> StorageService.java
>>>>>
>>>>>> (line 281) Stopping gossip by operator request
>>>>>>>   INFO [CompactionExecutor:1] 2014-04-29 05:55:15,406 Gossiper.java
>>>>>>>
>>>>>> (line
>>>>>
>>>>>> 1271) Announcing shutdown
>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,406
>>>>>>>
>>>>>> StorageService.java
>>>>>
>>>>>> (line 372) Stopping RPC server
>>>>>>>   INFO [CompactionExecutor:1] 2014-04-29 05:55:17,406
>>>>>>> ThriftServer.java
>>>>>>> (line 141) Stop listening to thrift clients
>>>>>>> ERROR [CompactionExecutor:1] 2014-04-29 05:55:17,417
>>>>>>>
>>>>>> StorageService.java
>>>>>
>>>>>> (line 377) Stopping native transport
>>>>>>>   INFO [CompactionExecutor:1] 2014-04-29 05:55:17,504 Server.java
>>>>>>> (line
>>>>>>> 181) Stop listening for CQL clients
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: bugs report for 2.0.7

Reply via email to