Hello Pedro Thanks for being interested in contributing to Apache Cassandra. Creating a new compaction strategy is not an easy task and there are several things you can do to make it more obvious for other developers to understand what you're up to.
First of all, if using github, changes to the code base should be done by having a separate branch in your own fork of the Apache repository. This will make it possible for others to quickly compare your changes to the current code base using the web interface. Technically using a new repo works as well, but isn't as convenient for others, e.g. it starts by not communicating which Cassandra branch was used as basis for you changes. Talking about git, I'd also suggest to learn more about creating a git history for your code that is easy to review. E.g. you may want to squash some of the "code clean up" style commits. As mentioned, implementing a new compaction strategy is quite an effort and the theories and motivations behind this is at least as interesting as the actual implementation. Therefor it could be a good idea to have a design document describing your work on a different abstraction level. It will also make it more likely to get other people involved in the discussion, as not everyone will have to check the source code for the details. -Stefan On 08.06.2017 09:31, Pedro Gordo wrote: > Hi all > > As part of my MSc project, I've done a new compaction strategy for > Cassandra, called Burst Hour Compaction Strategy. You can find the JIRA > ticket here: https://issues.apache.org/jira/browse/CASSANDRA-12201 > > In a nutshell, the background compaction for this strategy is only > triggered during a predefined interval, freeing the resources during other > times of the day. It also tries to make keys unique across all the > SSTables, when these keys that are present in more than a configurable > number of tables. Please check the JIRA ticket for a full description. > > The code can be found here: https://github.com/sedulam/CASSANDRA-12201 > > Please let me know what you think, or improvements that can be done (some > ideas are in the ticket description). Since I'm new to Cassandra, I imagine > that a lot of assumptions might not be the best, e.g. 100MB for the maximum > table size. > > I'm looking forward to working with this community! > > All the best > Pedro Gordo > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org