Hello Pedro

Thanks for being interested in contributing to Apache Cassandra.
Creating a new compaction strategy is not an easy task and there are
several things you can do to make it more obvious for other developers
to understand what you're up to.

First of all, if using github, changes to the code base should be done
by having a separate branch in your own fork of the Apache repository.
This will make it possible for others to quickly compare your changes to
the current code base using the web interface. Technically using a new
repo works as well, but isn't as convenient for others, e.g. it starts
by not communicating which Cassandra branch was used as basis for you
changes.

Talking about git, I'd also suggest to learn more about creating a git
history for your code that is easy to review. E.g. you may want to
squash some of the "code clean up" style commits.

As mentioned, implementing a new compaction strategy is quite an effort
and the theories and motivations behind this is at least as interesting
as the actual implementation. Therefor it could be a good idea to have a
design document describing your work on a different abstraction level.
It will also make it more likely to get other people involved in the
discussion, as not everyone will have to check the source code for the
details.

-Stefan


On 08.06.2017 09:31, Pedro Gordo wrote:
> Hi all
> 
> As part of my MSc project, I've done a new compaction strategy for
> Cassandra, called Burst Hour Compaction Strategy. You can find the JIRA
> ticket here: https://issues.apache.org/jira/browse/CASSANDRA-12201
> 
> In a nutshell, the background compaction for this strategy is only
> triggered during a predefined interval, freeing the resources during other
> times of the day. It also tries to make keys unique across all the
> SSTables, when these keys that are present in more than a configurable
> number of tables. Please check the JIRA ticket for a full description.
> 
> The code can be found here: https://github.com/sedulam/CASSANDRA-12201
> 
> Please let me know what you think, or improvements that can be done (some
> ideas are in the ticket description). Since I'm new to Cassandra, I imagine
> that a lot of assumptions might not be the best, e.g. 100MB for the maximum
> table size.
> 
> I'm looking forward to working with this community!
> 
> All the best
> Pedro Gordo
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to