Re: [nupic-dev] NuPIC versioning and releases

Matthew Taylor Thu, 14 Nov 2013 13:15:27 -0800

Thanks so much (both of you) for your thoughtful posts. I have
cherry-picked out of your message and added some comments and stupid
questions below.

On Thu, Nov 14, 2013 at 9:51 AM, stewart mackenzie <[email protected]> wrote:
> Hi All,
>
> my 2 cents worth:

More like $2, and worth every penny. ;-)

> In response to the chap sitting to the right of Matt. You won't carry
> around a stabilization repo. It's becoming legacy, your attention is
> focused on development. Besides as a maintainer you don't carry the
> codebase around with you on your laptop. You'll just click the green
> merge button if it follows coding standards and has an associated
> issue on stabilization. Then you move on to the bleeding edge where
> its exciting.

This is probably obvious, but if the PR being merged into the
stabilization branch/fork must also be applied the bleeding edge
master.

> C4 is meant to keep software projects alive past the lifespan of the
> main backing companies. It also makes sure that vendors can't lock the
> code down. This include Grok, as strange as that sounds. In a way, now
> don't misunderstand me. Grok is NuPIC's greatest possible enemy, yet
> most loved one. NuPIC needs to take on its own identity, one that can
> nurture the world, and the world it. If it becomes the world nurturing
> Grok then trouble will start to boil. Lets not have an Open Office, or
> D Language complication. What a pity that would be. It hurts everyone.
> NuPIC is reaching adulthood. It'll soon take its own identity
> independent of Grok. Grok one day may die, but NuPIC will live on.

As a Numenta / Grok employee, I want to emphasize how aware we are of
this. Jeff wanted to give NuPIC to the community for exactly the
reasons you're implying. We all understand what the important
technology is here. In the future, Numenta could very well create more
products on top of NuPIC, so we're trying hard to do the right thing
with NuPIC, for our own welfare and the welfare of the world.

> Now I want to encourage everyone to think about something.
> * Imagine all those hundreds of package managers out there for
> windows, linux, macos, they can easily handle an explicit URL to
> automatically pull and build the code. Who wants to debug strange
> scripting languages to make it 'git checkout numenta-v1.3-stable' -
> Its good to make it easy so that those package maintainers can spread
> nupic. Package maintainers most likely will just want to download a
> zip file. This way they can point to a stabilization fork download URL
> without needing git dependencies that might on might not be supported
> in their scripting language. Github doesn't allow downloading a zipped
> branches. (this point cannot be heavy enough)

Can't we assume any scripting environment would have to do a "git
clone" against a target URL denoting a stable release? If so, there's
not much of a difference between a "git clone VERSION-SLUG" vs "git
clone nupic; git checkout VERSION-SLUG". Is this really an issue, or
am I missing your point?

> * The branching model encourages too many stabilizations at the wrong
> time or even worse stabilization by date releases.

I don't see how it does. Are you suggesting that it is easier to
branch, therefore we'll be more likely to create release branches? So
the additional overhead of forked releases will prevent gratuitous
releases? If so, I don't believe we'll allow that to happen. Releases
should happen, as you said, when it "feels right", not because it's
convenient. Whether we branch or fork makes no difference.

> * Stabilization repos will settle down over time to the point that you
> will forget about what is going on with it - especially when you're
> ten years on. If you follow the branching versioning model you'll get
> a strange pull request to an old branch that you as a new developer
> know nothing about. What will you make of that Pull Request?
> Especially when development ten years on has completely whacked that
> section of code. You have to adjust your head.

So you're saying this would be less likely to happen with release
forks because devs will be less likely to submit patches to old
releases if they are forks vs. branches? Are you talking about
accidental pushes to the wrong branch or intentional ones? If they are
intentional, they may be necessary changes for a release in either
case, and we'll need to make the same decision whether it's a fork or
branch.

> * Stabilizations encourage money and time pressed individuals to use
> your stable versions. Just cause you the developer know how to change
> branches that doesn't mean other people know how to get to them.
> Imagine someone from CVS needing to get a stable version. They'll
> start swearing, using git forks allows you to get it with a simple
> 'git clone'. Most likely this person will want to download a zip file.

I think this is a good point.

> * Don't worry about release dates, Stabilize when it "feels" like its time.

+1

> * When moving on past stabilization issues are essentially for the
> development repo. You don't want issues being created for a bit of
> code that was removed 10 years ago, then require everyone to shift
> gears, checkout the code base and apply the patch.
> * Bugfix issues should be created on the stabilization fork. It gives
> a clear history of bugfixes associated with it. (then you can cherry
> pick if it's pertinent to development and visa versa)

I like this, it provides a clear record of the bugfixes applied to a
stabilization.

> Regarding development talk in the video:
> Matt understands that it C4 isn't just random people throwing crazy
> code at master. Issues *have* to be created *before* the pull request
> is created. This gets the community's eyeballs on the topic. Days and
> weeks can go by with the wisdom of the crowds mulling over the problem
> in the shower, till a eureka moment is had, this gets communicated on
> the issue page and the group moves on. That's when the *expected* pull
> request comes in.
> This process frees up maintainers whom aren't supposed to do deep code
> review nor bugfixing at all (travis makes sure it builds). Maintainers
> just make sure the Pull Request and coding standards are met. *If*
> there is a bug in a Pull Request it doesn't matter, the maintainer
> commits it, another issue is created and another pull-request is
> submitted to fix it (or it can just be on the same issue). This
> creates a kind of wikipedia (in the early days) type of effect.
> Whereby people fix others work. A hive of ants working together all
> looking up to the conversations on issues board for the path ahead.
> This gets everyone on the same page and doesn't wear out maintainers
> who think wtf is that new thing touching such a core bit of code?

Agreed. If we enforce the C4 guidelines, there *should* never be a PR
that has not been already discussed and vetted on the mailing list or
issue tracker, or both. We will work towards this model, which would
enable me as a maintainer to more quickly approve PRs, and let the
core developers argue about the implementation details elsewhere.

> Regarding algorithm changes in the video towards the end:
> Don't conflate the conversation on the issues with the conversation on
> the pull request.
> master will take the 'best' algorithm. There was talk about easily
> flipping a switch to be able to test the differences. This stage is
> still at the testing phase, there shouldn't even be a pull request in
> the pipeline. The developers should have remotes of those
> non-canonical forked branches. This allows you to easily flip that
> switch (change branches) to determine which is the best algorithm.
> Once there is community consensus that 1 of the 3 potential solutions
> is the right way to go then this will be elaborated on in the issue.
> The correct pull request + modification? are used and the other
> branches get whacked by their owners.

The core problem today is that we DON'T have the regression tests in
place to allow this. Once we do, it should be easy for devs to run
their specific PR through some regression suite to validate it. Until
we have a regression suite in place, we either reject PRs that
introduce any major algorithmic changes, or we allow them to get into
the codebase via feature flag, or we closely scrutinize each one and
run manual regression testing (dangerous). I don't like any of these
options. The right answer is to establish a regression test suite,
which is going to take a lot of work and time.

> Issues are where the above
> conversation should take place *not* the pull request page.

Yes, +1. I'm going to start enforcing this as soon as I get the docs
in order for reference. PRs are for code review and comments.

> I personally advocate a rolling release whereby the APIs don't change
> as the C4 states. It simplifies things greatly and keeps mindshare in
> one place, yet keeps the stable seekers happy as they believe in the
> maintainers who are hard as nails and disciplined about contribution
> guidelines. (you need a German on the maintainer list btw - I'm not
> joking) This way legacy software can still use the tight optimized
> always improving code base. It makes life simple for _everyone_ given
> maintainers and contirbutors know their shit. I do not advocate a
> branching model, and prefer a forked model if I can only choose of the
> two. NuPIC being a very algorithmic type project that does 2 (soon 3)
> things can keep a very simple API. It has a small API but mega stuff
> hidden away on the inside. This is perfect for a rolling release.

When you say "rolling release", what exactly do you mean? I understand
that there can be no feature additions within the stability branch,
only bugfixes. Let's use a concrete theoretical example. This is what
I *think* you mean, so please correct me:

- NuPIC is at v1.0
- devs add bug fixes and new features on master
- devs decide it's time to release
- a stability (branch|fork) is created as v1.1
- a bug is reported against v1.1
- devs work to correct and patch bug on v1.1
  - stability branch stays v1.1
  - if bug is severe enough, fix is applied to master as well (without
waiting for a merge from the official release into master)
- v1.1 is declared stable!
- v1.1 is tagged as official and merged into master

>From what I imagine above, we would throw out the PATCH in the
semantive versioning scheme of vMAJOR.MINOR.PATCH. All MINOR releases
would simply be new features and bugfixes. MAJOR releases would break
the API.

Right?

> Though! I would only declare NuPIC official and ready to build on once
> the C++ core is stripped out, regression tests are in, language
> bindings have their own repo (one for each language binding). I
> wouldn't bother with considering using stabilizations or not till the
> core is out and some _serious_ talk about API is had. That would be
> the beginning for me.

You don't think we should at least establish this stabilization
process beforehand? I think it would be useful to start off creating
v0.1, v0.2 "beta" releases now, building up towards something we all
agree should be flagged as v1.0. Do you think we're getting ahead of
ourselves by having this discussion too early? Would you rather we
dedicate our resources to working on those issues you describe above
before we continue this carry on this dialogue?

Numenta has received a few independent inquiries about commercial
licenses for NuPIC. So the demand for an "official" API could be
closer than we think.

---------
Matt Taylor
OS Community Flag-Bearer
Numenta

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] NuPIC versioning and releases

Reply via email to