Re: Cleanup understastanding
Thanks for the answers. I got it. I was using cleanup, because I thought it would delete the tombstones. But, that is still awkward. Does cleanup take so much disk space to complete the compaction operation? In other words, twice the size? *Atenciosamente,* *Víctor Hugo Molinar - *@vhmolinar http://twitter.com/#!/vhmolinar On Tue, May 28, 2013 at 9:55 PM, Takenori Sato(Cloudian) ts...@cloudian.com wrote: Hi Victor, As Andrey said, running cleanup doesn't work as you expect. The reason I need to clean things is that I wont need most of my inserted data on the next day. Deleted objects(columns/records) become deletable from sstable file when they get expired(after gc_grace_seconds). Such deletable objects are actually gotten rid of by compaction. The tricky part is that a deletable object remains unless all of its old objects(the same row key) are contained in the set of sstable files involved in the compaction. - Takenori (2013/05/29 3:01), Andrey Ilinykh wrote: cleanup removes data which doesn't belong to the current node. You have to run it only if you move (or add new) nodes. In your case there is no any reason to do it. On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Hello everyone. I have a daily maintenance task at c* which does: -truncate cfs -clearsnapshots -repair -cleanup The reason I need to clean things is that I wont need most of my inserted data on the next day. It's kind a business requirement. Well, the problem I'm running to, is the misunderstanding about cleanup operation. I have 2 nodes with lower than half usage of disk, which is moreless 13GB; But, the last few days, arbitrarily each node have reported me a cleanup error indicating that the disk was full. Which is not true. *Error occured during cleanup* *java.util.concurrent.ExecutionException: java.io.IOException: disk full* So I'd like to know more about what does happens in a cleanup operation. Appreciate any help.
Re: Cleanup understastanding
But, that is still awkward. Does cleanup take so much disk space to complete the compaction operation? In other words, twice the size? Not really, but logically yes. According to 1.0.7 source, cleanup checks if there's enough space that is larger than the worst scenario as below. If not, the exception you got is thrown. /* * Add up all the files sizes this is the worst case file * size for compaction of all the list of files given. */ public long getExpectedCompactedFileSize(IterableSSTableReader sstables) { long expectedFileSize = 0; for (SSTableReader sstable : sstables) { long size = sstable.onDiskLength(); expectedFileSize = expectedFileSize + size; } return expectedFileSize; } On Wed, May 29, 2013 at 10:43 PM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Thanks for the answers. I got it. I was using cleanup, because I thought it would delete the tombstones. But, that is still awkward. Does cleanup take so much disk space to complete the compaction operation? In other words, twice the size? *Atenciosamente,* *Víctor Hugo Molinar - *@vhmolinar http://twitter.com/#!/vhmolinar On Tue, May 28, 2013 at 9:55 PM, Takenori Sato(Cloudian) ts...@cloudian.com wrote: Hi Victor, As Andrey said, running cleanup doesn't work as you expect. The reason I need to clean things is that I wont need most of my inserted data on the next day. Deleted objects(columns/records) become deletable from sstable file when they get expired(after gc_grace_seconds). Such deletable objects are actually gotten rid of by compaction. The tricky part is that a deletable object remains unless all of its old objects(the same row key) are contained in the set of sstable files involved in the compaction. - Takenori (2013/05/29 3:01), Andrey Ilinykh wrote: cleanup removes data which doesn't belong to the current node. You have to run it only if you move (or add new) nodes. In your case there is no any reason to do it. On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Hello everyone. I have a daily maintenance task at c* which does: -truncate cfs -clearsnapshots -repair -cleanup The reason I need to clean things is that I wont need most of my inserted data on the next day. It's kind a business requirement. Well, the problem I'm running to, is the misunderstanding about cleanup operation. I have 2 nodes with lower than half usage of disk, which is moreless 13GB; But, the last few days, arbitrarily each node have reported me a cleanup error indicating that the disk was full. Which is not true. *Error occured during cleanup* *java.util.concurrent.ExecutionException: java.io.IOException: disk full * So I'd like to know more about what does happens in a cleanup operation. Appreciate any help.
Cleanup understastanding
Hello everyone. I have a daily maintenance task at c* which does: -truncate cfs -clearsnapshots -repair -cleanup The reason I need to clean things is that I wont need most of my inserted data on the next day. It's kind a business requirement. Well, the problem I'm running to, is the misunderstanding about cleanup operation. I have 2 nodes with lower than half usage of disk, which is moreless 13GB; But, the last few days, arbitrarily each node have reported me a cleanup error indicating that the disk was full. Which is not true. *Error occured during cleanup* *java.util.concurrent.ExecutionException: java.io.IOException: disk full* So I'd like to know more about what does happens in a cleanup operation. Appreciate any help.
Re: Cleanup understastanding
cleanup removes data which doesn't belong to the current node. You have to run it only if you move (or add new) nodes. In your case there is no any reason to do it. On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Hello everyone. I have a daily maintenance task at c* which does: -truncate cfs -clearsnapshots -repair -cleanup The reason I need to clean things is that I wont need most of my inserted data on the next day. It's kind a business requirement. Well, the problem I'm running to, is the misunderstanding about cleanup operation. I have 2 nodes with lower than half usage of disk, which is moreless 13GB; But, the last few days, arbitrarily each node have reported me a cleanup error indicating that the disk was full. Which is not true. *Error occured during cleanup* *java.util.concurrent.ExecutionException: java.io.IOException: disk full* So I'd like to know more about what does happens in a cleanup operation. Appreciate any help.
Re: Cleanup understastanding
On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: So I'd like to know more about what does happens in a cleanup operation. Appreciate any help. ./src/java/org/apache/cassandra/db/compaction/CompactionManager.java line 591 of 1175 logger.info(Cleaning up + sstable); // Calculate the expected compacted filesize long expectedRangeFileSize = cfs.getExpectedCompactedFileSize(Arrays.asList(sstable), OperationType.CLEANUP); File compactionFileLocation = cfs.directories.getDirectoryForNewSSTables(expectedRangeFileSize); if (compactionFileLocation == null) throw new IOException(disk full); It looks like it is actually saying your disk is too full to complete compaction, not actually full right now. That said, a cleanup compaction does a 1:1 traversal of all SSTables, writing out a new one without any data that no longer belongs on the node due to range ownership changes. There is some lag in Cassandra before the JVM is able to actually delete files from disk, perhaps you are hitting this race condition? =Rob
Re: Cleanup understastanding
Hi Victor, As Andrey said, running cleanup doesn't work as you expect. The reason I need to clean things is that I wont need most of my inserted data on the next day. Deleted objects(columns/records) become deletable from sstable file when they get expired(after gc_grace_seconds). Such deletable objects are actually gotten rid of by compaction. The tricky part is that a deletable object remains unless all of its old objects(the same row key) are contained in the set of sstable files involved in the compaction. - Takenori (2013/05/29 3:01), Andrey Ilinykh wrote: cleanup removes data which doesn't belong to the current node. You have to run it only if you move (or add new) nodes. In your case there is no any reason to do it. On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com mailto:vhmoli...@gmail.com wrote: Hello everyone. I have a daily maintenance task at c* which does: -truncate cfs -clearsnapshots -repair -cleanup The reason I need to clean things is that I wont need most of my inserted data on the next day. It's kind a business requirement. Well, the problem I'm running to, is the misunderstanding about cleanup operation. I have 2 nodes with lower than half usage of disk, which is moreless 13GB; But, the last few days, arbitrarily each node have reported me a cleanup error indicating that the disk was full. Which is not true. /Error occured during cleanup/ /java.util.concurrent.ExecutionException: java.io.IOException: disk full/ So I'd like to know more about what does happens in a cleanup operation. Appreciate any help.