Re: New contribution - Burst Hour Compaction Strategy

2017-06-15 Thread Pedro Gordo
Hi Thanks for engaging in this discussion! Cameron, regarding the benchmark, I need to spend some time exploring the stress tool options, but I aim to create a stress test that goes on for a period of at least 48 hours, and then run it for all strategies (with a 24-hour burst for BHCS). I want

Re: New contribution - Burst Hour Compaction Strategy

2017-06-14 Thread Jeff Jirsa
Hi Pedro, I did a quick read through of your strategy, and have a few personal thoughts: First, writing a compaction strategy is a lot of work, and it's great to see new contributors take on ambitious projects. There are even a handful of ideas in here that may be useful to other strategies.

Re: New contribution - Burst Hour Compaction Strategy

2017-06-14 Thread Cameron Zemek
The main issue I see with this is "Read all the SSTables and detect which partition keys are present in more than the compaction minimum threshold value" . This is quite expensive and will be using quite a lot of I/O to calculate. What makes writing a compaction strategy so difficult is

Re: New contribution - Burst Hour Compaction Strategy

2017-06-14 Thread Pedro Gordo
Hi I've addressed the issues with Git. I believe this is what Stefan asking for: https://github.com/sedulam/cassandra/tree/12201 I've also added more tests for BHCS, including more for wide rows following Jeff's suggestion. Thanks for the directions so far! If there's something else you would

Re: New contribution - Burst Hour Compaction Strategy

2017-06-13 Thread Pedro Gordo
Hi all Although a couple of people engaged with me directly to talk about BHCS, I would also like to get the community opinion on this, so I thought I could get the discussion started by saying what the advantages would be and in which type of tables BHCS would do a good job. Please keep in mind

Re: New contribution - Burst Hour Compaction Strategy

2017-06-10 Thread J. D. Jordan
GitHub has some good guides on how to use git and make a pull request for a project. https://guides.github.com/introduction/flow/ https://guides.github.com/activities/forking/ > On Jun 10, 2017, at 3:17 PM, Pedro Gordo wrote: > > Hi all > > I've added to JIRA, a

Re: New contribution - Burst Hour Compaction Strategy

2017-06-10 Thread Pedro Gordo
Hi all I've added to JIRA, a document explaining how BHCS works with code snippets, and the motivation behind it. Because I'm not sure we can send attachments to the mailing list, please get the document from JIRA: https://issues.apache.org/jira/browse/CASSANDRA-12201 I'll check how to address

Re: New contribution - Burst Hour Compaction Strategy

2017-06-09 Thread Pedro Gordo
Hi Stefan Thanks for pointing these out. So far, I've only worked collaboratively with SVN, so I wasn't sure how best to address this part of the development with Git. I'll create a document explaining what I've done, hopefully until the end of this week, so that people at least can discuss the

Re: New contribution - Burst Hour Compaction Strategy

2017-06-09 Thread Stefan Podkowinski
Hello Pedro Thanks for being interested in contributing to Apache Cassandra. Creating a new compaction strategy is not an easy task and there are several things you can do to make it more obvious for other developers to understand what you're up to. First of all, if using github, changes to the

New contribution - Burst Hour Compaction Strategy

2017-06-08 Thread Pedro Gordo
Hi all As part of my MSc project, I've done a new compaction strategy for Cassandra, called Burst Hour Compaction Strategy. You can find the JIRA ticket here: https://issues.apache.org/jira/browse/CASSANDRA-12201 In a nutshell, the background compaction for this strategy is only triggered during