Re: Cleanup understastanding

2013-05-29 Thread Víctor Hugo Oliveira Molinar
Thanks for the answers.

I got it. I was using cleanup, because I thought it would delete the
tombstones.
But, that is still awkward. Does cleanup take so much disk space to
complete the compaction operation? In other words, twice the size?


*Atenciosamente,*
*Víctor Hugo Molinar - *@vhmolinar http://twitter.com/#!/vhmolinar


On Tue, May 28, 2013 at 9:55 PM, Takenori Sato(Cloudian) ts...@cloudian.com
 wrote:

  Hi Victor,

 As Andrey said, running cleanup doesn't work as you expect.


  The reason I need to clean things is that I wont need most of my
 inserted data on the next day.

 Deleted objects(columns/records) become deletable from sstable file when
 they get expired(after gc_grace_seconds).

 Such deletable objects are actually gotten rid of by compaction.

 The tricky part is that a deletable object remains unless all of its old
 objects(the same row key) are contained in the set of sstable files
 involved in the compaction.

 - Takenori


 (2013/05/29 3:01), Andrey Ilinykh wrote:

 cleanup removes data which doesn't belong to the current node. You have to
 run it only if you move (or add new) nodes. In your case there is no any
 reason to do it.


 On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar 
 vhmoli...@gmail.com wrote:

 Hello everyone.
 I have a daily maintenance task at c* which does:

 -truncate cfs
 -clearsnapshots
 -repair
 -cleanup

 The reason I need to clean things is that I wont need most of my inserted
 data on the next day. It's kind a business requirement.

 Well,  the problem I'm running to, is the misunderstanding about cleanup
 operation.
 I have 2 nodes with lower than half usage of disk, which is moreless 13GB;

 But, the last few days, arbitrarily each node have reported me a cleanup
 error indicating that the disk was full. Which is not true.

 *Error occured during cleanup*
 *java.util.concurrent.ExecutionException: java.io.IOException: disk full*


  So I'd like to know more about what does happens in a cleanup operation.
 Appreciate any help.






Re: Cleanup understastanding

2013-05-29 Thread Takenori Sato
 But, that is still awkward. Does cleanup take so much disk space to
complete the compaction operation? In other words, twice the size?

Not really, but logically yes.

According to 1.0.7 source, cleanup checks if there's enough space that is
larger than the worst scenario as below. If not, the exception you got is
thrown.

/*
 * Add up all the files sizes this is the worst case file
 * size for compaction of all the list of files given.
 */
public long getExpectedCompactedFileSize(IterableSSTableReader
sstables)
{
long expectedFileSize = 0;
for (SSTableReader sstable : sstables)
{
long size = sstable.onDiskLength();
expectedFileSize = expectedFileSize + size;
}
return expectedFileSize;
}


On Wed, May 29, 2013 at 10:43 PM, Víctor Hugo Oliveira Molinar 
vhmoli...@gmail.com wrote:

 Thanks for the answers.

 I got it. I was using cleanup, because I thought it would delete the
 tombstones.
 But, that is still awkward. Does cleanup take so much disk space to
 complete the compaction operation? In other words, twice the size?


 *Atenciosamente,*
 *Víctor Hugo Molinar - *@vhmolinar http://twitter.com/#!/vhmolinar


 On Tue, May 28, 2013 at 9:55 PM, Takenori Sato(Cloudian) 
 ts...@cloudian.com wrote:

  Hi Victor,

 As Andrey said, running cleanup doesn't work as you expect.


  The reason I need to clean things is that I wont need most of my
 inserted data on the next day.

 Deleted objects(columns/records) become deletable from sstable file when
 they get expired(after gc_grace_seconds).

 Such deletable objects are actually gotten rid of by compaction.

 The tricky part is that a deletable object remains unless all of its old
 objects(the same row key) are contained in the set of sstable files
 involved in the compaction.

 - Takenori


 (2013/05/29 3:01), Andrey Ilinykh wrote:

 cleanup removes data which doesn't belong to the current node. You have
 to run it only if you move (or add new) nodes. In your case there is no any
 reason to do it.


 On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar 
 vhmoli...@gmail.com wrote:

 Hello everyone.
 I have a daily maintenance task at c* which does:

 -truncate cfs
 -clearsnapshots
 -repair
 -cleanup

 The reason I need to clean things is that I wont need most of my
 inserted data on the next day. It's kind a business requirement.

 Well,  the problem I'm running to, is the misunderstanding about cleanup
 operation.
 I have 2 nodes with lower than half usage of disk, which is moreless
 13GB;

 But, the last few days, arbitrarily each node have reported me a cleanup
 error indicating that the disk was full. Which is not true.

 *Error occured during cleanup*
 *java.util.concurrent.ExecutionException: java.io.IOException: disk full
 *


  So I'd like to know more about what does happens in a cleanup
 operation.
 Appreciate any help.







Cleanup understastanding

2013-05-28 Thread Víctor Hugo Oliveira Molinar
Hello everyone.
I have a daily maintenance task at c* which does:

-truncate cfs
-clearsnapshots
-repair
-cleanup

The reason I need to clean things is that I wont need most of my inserted
data on the next day. It's kind a business requirement.

Well,  the problem I'm running to, is the misunderstanding about cleanup
operation.
I have 2 nodes with lower than half usage of disk, which is moreless 13GB;

But, the last few days, arbitrarily each node have reported me a cleanup
error indicating that the disk was full. Which is not true.

*Error occured during cleanup*
*java.util.concurrent.ExecutionException: java.io.IOException: disk full*


So I'd like to know more about what does happens in a cleanup operation.
Appreciate any help.


Re: Cleanup understastanding

2013-05-28 Thread Andrey Ilinykh
cleanup removes data which doesn't belong to the current node. You have to
run it only if you move (or add new) nodes. In your case there is no any
reason to do it.


On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar 
vhmoli...@gmail.com wrote:

 Hello everyone.
 I have a daily maintenance task at c* which does:

 -truncate cfs
 -clearsnapshots
 -repair
 -cleanup

 The reason I need to clean things is that I wont need most of my inserted
 data on the next day. It's kind a business requirement.

 Well,  the problem I'm running to, is the misunderstanding about cleanup
 operation.
 I have 2 nodes with lower than half usage of disk, which is moreless 13GB;

 But, the last few days, arbitrarily each node have reported me a cleanup
 error indicating that the disk was full. Which is not true.

 *Error occured during cleanup*
 *java.util.concurrent.ExecutionException: java.io.IOException: disk full*


 So I'd like to know more about what does happens in a cleanup operation.
 Appreciate any help.



Re: Cleanup understastanding

2013-05-28 Thread Robert Coli
On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar
vhmoli...@gmail.com wrote:
 So I'd like to know more about what does happens in a cleanup operation.
 Appreciate any help.

./src/java/org/apache/cassandra/db/compaction/CompactionManager.java
line 591 of 1175

logger.info(Cleaning up  + sstable);
// Calculate the expected compacted filesize
long expectedRangeFileSize =
cfs.getExpectedCompactedFileSize(Arrays.asList(sstable),
OperationType.CLEANUP);
File compactionFileLocation =
cfs.directories.getDirectoryForNewSSTables(expectedRangeFileSize);
if (compactionFileLocation == null)
throw new IOException(disk full);


It looks like it is actually saying your disk is too full to complete
compaction, not actually full right now.

That said, a cleanup compaction does a 1:1 traversal of all SSTables,
writing out a new one without any data that no longer belongs on the
node due to range ownership changes. There is some lag in Cassandra
before the JVM is able to actually delete files from disk, perhaps you
are hitting this race condition?

=Rob


Re: Cleanup understastanding

2013-05-28 Thread Takenori Sato(Cloudian)

Hi Victor,

As Andrey said, running cleanup doesn't work as you expect.

 The reason I need to clean things is that I wont need most of my 
inserted data on the next day.


Deleted objects(columns/records) become deletable from sstable file when 
they get expired(after gc_grace_seconds).


Such deletable objects are actually gotten rid of by compaction.

The tricky part is that a deletable object remains unless all of its old 
objects(the same row key) are contained in the set of sstable files 
involved in the compaction.


- Takenori

(2013/05/29 3:01), Andrey Ilinykh wrote:
cleanup removes data which doesn't belong to the current node. You 
have to run it only if you move (or add new) nodes. In your case there 
is no any reason to do it.



On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar 
vhmoli...@gmail.com mailto:vhmoli...@gmail.com wrote:


Hello everyone.
I have a daily maintenance task at c* which does:

-truncate cfs
-clearsnapshots
-repair
-cleanup

The reason I need to clean things is that I wont need most of my
inserted data on the next day. It's kind a business requirement.

Well,  the problem I'm running to, is the misunderstanding about
cleanup operation.
I have 2 nodes with lower than half usage of disk, which is
moreless 13GB;

But, the last few days, arbitrarily each node have reported me a
cleanup error indicating that the disk was full. Which is not true.

/Error occured during cleanup/
/java.util.concurrent.ExecutionException: java.io.IOException:
disk full/


So I'd like to know more about what does happens in a cleanup
operation.
Appreciate any help.