Re: Paying off tech debt and correctly naming things

Jeremiah D Jordan Wed, 21 Mar 2018 17:50:22 -0700

+1 if you are willing to take it on.  As the person who performed the 
Table->Keyspace rename of 2.0, I say good luck!  From hindsight of doing that, 
as others suggested, I would come at this in multiple tickets.
I would suggest a simple class rename with intellij refactoring tools or 
something as the first ticket.  This is going to touch the most files at once, 
but will be mechanical, and for the most part if it compiles it was right :).
After you have done that you can take on other renaming of things with a 
smaller scope.
Also as others have said the main things to be wary of are the naming of things 
in JMX metrics.  Ideally we would keep around deprecated aliases of the old JMX 
names for a release before removing them.  The other thing is to watch out for 
class names in byte man scripts in dtest.


-Jeremiah

> On Mar 21, 2018, at 4:48 AM, Sylvain Lebresne <lebre...@gmail.com> wrote:
> 
> I really don't think anyone has been recently against such renaming, and in
> fact, a _lot_ of renaming *has* already happen over time. The problem, as
> you carefully noted, is that it's such a big task that there is still a lot
> to do. Anyway, I've yet to see a patch renaming things to match the CQL
> naming scheme be rejected, so I'd personally encourage such submission. But
> maybe with a few caveats (already mentioned largely, so repeating here to
> signify my personal agreement with them):
> - renaming with large surface area can be painful for ongoing patches or
> even future merge. That's not a reason for not doing them, but that's imo a
> good enough reason to do things incrementally/in as-small-as-reasonable
> steps. Making sure a renaming commit only does renaming and doesn't change
> the logic is also pretty nice when you rebase such things.
> - breaking hundreds of tests is obviously not ok :)
> - pure code renaming is one reasonably simple aspect, but quite a few
> renaming may have user visible impact. Particularly around JMX where many
> things are name based on their class, and to a lesser extend some of our
> tools still use "old" naming. We can't and shouldn't ignore those impact:
> such user visible changes should imo be documented, and we should make sure
> we have a reasonably painless (and thus incremental) upgrade path. My hunch
> is the latter isn't as simple as it seems.
> 
> 
> --
> Sylvain
> 
> 
> On Wed, Mar 21, 2018 at 9:06 AM kurt greaves <k...@instaclustr.com> wrote:
> 
>> As someone who came to the codebase post CQL but prior to thrift being
>> removed, +1 to refactor. The current mixing of terminology is a complete
>> nightmare. This would also give a good opportunity document a lot of code
>> that simply isn't documented (or incorrect). I'd say it's worth doing it in
>> multiple steps though, such as refactor of a single class at a time, then
>> followed by refactor of variable names. We've already done one pretty big
>> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
>> make it any worse (lol).
>> 
>> Row vs partition vs key vs PK is killing me
>> 
>> On 20 March 2018 at 22:04, Jon Haddad <j...@jonhaddad.com> wrote:
>> 
>>> Whenever I hop around in the codebase, one thing that always manages to
>>> slow me down is needing to understand the context of the variable names
>>> that I’m looking at.  We’ve now removed thrift the transport, but the
>>> variables, classes and comments still remain.  Personally, I’d like to go
>>> in and pay off as much technical debt as possible by refactoring the code
>>> to be as close to CQL as possible.  Rows should be rows, not partitions,
>>> I’d love to see the term column family removed forever in favor of always
>>> using tables.  That said, it’s a big task.  I did a quick refactor in a
>>> branch, simply changing the ColumnFamilyStore class to TableStore, and
>>> pushed it up to GitHub. [1]
>>> 
>>> Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
>>> LOC changed across 275 files.  I’ll note that my branch doesn’t change
>> any
>>> of the almost 1000 search results of “columnfamilystore” found in the
>>> codebase and hundreds of tests failed on my branch in CircleCI, so that
>> 2K
>>> LOC change would probably be quite a bit bigger.  There is, of course, a
>>> lot more than just renaming this one class, there’s thousands of variable
>>> names using any manner of “cf”, “cfs”, “columnfamily”, names plus
>> comments
>>> and who knows what else.  There’s lots of references in probably every
>> file
>>> that would have to get updated.
>>> 
>>> What are people’s thoughts on this?  We should be honest with ourselves
>>> and know this isn’t going to get any easier over time.  It’s only going
>> to
>>> get more confusing for new people to the project, and having to figure
>> out
>>> “what kind of row am i even looking at” is a waste of time.  There’s
>>> obviously a much bigger impact than just renaming a bunch of files,
>> there’s
>>> any number of patches and branches that would become outdated, plus
>> anyone
>>> pulling in Cassandra as a dependency would be affected.  I don’t really
>>> have a solution for the disruption other than “leave it in place”, but in
>>> my mind that’s not a great (or even good) solution.
>>> 
>>> Anyways, enough out of me.  My concern for ergonomics and naming might be
>>> significantly higher than the rest of the folks working in the code, and
>> I
>>> wanted to put a feeler out there before I decided to dig into this in a
>>> more serious manner.
>>> 
>>> Jon
>>> 
>>> [1]
>>> 
>> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
>>> <
>>> 
>> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
>>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Paying off tech debt and correctly naming things

Reply via email to