Re: Paying off tech debt and correctly naming things

Sylvain Lebresne Wed, 21 Mar 2018 01:49:28 -0700

I really don't think anyone has been recently against such renaming, and in
fact, a _lot_ of renaming *has* already happen over time. The problem, as
you carefully noted, is that it's such a big task that there is still a lot
to do. Anyway, I've yet to see a patch renaming things to match the CQL
naming scheme be rejected, so I'd personally encourage such submission. But
maybe with a few caveats (already mentioned largely, so repeating here to
signify my personal agreement with them):
- renaming with large surface area can be painful for ongoing patches or
even future merge. That's not a reason for not doing them, but that's imo a
good enough reason to do things incrementally/in as-small-as-reasonable
steps. Making sure a renaming commit only does renaming and doesn't change
the logic is also pretty nice when you rebase such things.
- breaking hundreds of tests is obviously not ok :)
- pure code renaming is one reasonably simple aspect, but quite a few
renaming may have user visible impact. Particularly around JMX where many
things are name based on their class, and to a lesser extend some of our
tools still use "old" naming. We can't and shouldn't ignore those impact:
such user visible changes should imo be documented, and we should make sure
we have a reasonably painless (and thus incremental) upgrade path. My hunch
is the latter isn't as simple as it seems.



--
Sylvain


On Wed, Mar 21, 2018 at 9:06 AM kurt greaves <[email protected]> wrote:

> As someone who came to the codebase post CQL but prior to thrift being
> removed, +1 to refactor. The current mixing of terminology is a complete
> nightmare. This would also give a good opportunity document a lot of code
> that simply isn't documented (or incorrect). I'd say it's worth doing it in
> multiple steps though, such as refactor of a single class at a time, then
> followed by refactor of variable names. We've already done one pretty big
> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
> make it any worse (lol).
>
> Row vs partition vs key vs PK is killing me
>
> On 20 March 2018 at 22:04, Jon Haddad <[email protected]> wrote:
>
> > Whenever I hop around in the codebase, one thing that always manages to
> > slow me down is needing to understand the context of the variable names
> > that I’m looking at.  We’ve now removed thrift the transport, but the
> > variables, classes and comments still remain.  Personally, I’d like to go
> > in and pay off as much technical debt as possible by refactoring the code
> > to be as close to CQL as possible.  Rows should be rows, not partitions,
> > I’d love to see the term column family removed forever in favor of always
> > using tables.  That said, it’s a big task.  I did a quick refactor in a
> > branch, simply changing the ColumnFamilyStore class to TableStore, and
> > pushed it up to GitHub. [1]
> >
> > Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> > LOC changed across 275 files.  I’ll note that my branch doesn’t change
> any
> > of the almost 1000 search results of “columnfamilystore” found in the
> > codebase and hundreds of tests failed on my branch in CircleCI, so that
> 2K
> > LOC change would probably be quite a bit bigger.  There is, of course, a
> > lot more than just renaming this one class, there’s thousands of variable
> > names using any manner of “cf”, “cfs”, “columnfamily”, names plus
> comments
> > and who knows what else.  There’s lots of references in probably every
> file
> > that would have to get updated.
> >
> > What are people’s thoughts on this?  We should be honest with ourselves
> > and know this isn’t going to get any easier over time.  It’s only going
> to
> > get more confusing for new people to the project, and having to figure
> out
> > “what kind of row am i even looking at” is a waste of time.  There’s
> > obviously a much bigger impact than just renaming a bunch of files,
> there’s
> > any number of patches and branches that would become outdated, plus
> anyone
> > pulling in Cassandra as a dependency would be affected.  I don’t really
> > have a solution for the disruption other than “leave it in place”, but in
> > my mind that’s not a great (or even good) solution.
> >
> > Anyways, enough out of me.  My concern for ergonomics and naming might be
> > significantly higher than the rest of the folks working in the code, and
> I
> > wanted to put a feeler out there before I decided to dig into this in a
> > more serious manner.
> >
> > Jon
> >
> > [1]
> >
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> > <
> >
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> > >
>

Re: Paying off tech debt and correctly naming things

Reply via email to