Re: [Gnu-arch-users] 2.0 fill in the blanks

Matteo Settenvini Fri, 21 Apr 2006 13:53:19 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

All of this will be highly subjective (what isn't?), and please beg with
me if I made some wrong assumptions... I'm here to learn, not to lead.
So, my 2 €-cents.

Thomas Lord wrote:
> 
> 1. Who should be in the target market for Arch 2.0?

Some scenarios come to mind:

  i. As a young and inexperienced developer (still a student), I do more
mistakes than it should be allowed by law :-). Mostly, I work on small
prototypes with the aim to teach myself how to work with some specific
technology, or to experience with newly learned algorithms. It makes
little or no sense for me to publish my work on a public server, like
Savannah, Gna! or Sourceforge. Like me, there are a lot of other
students or professionals out there that could find interesting storing
revisions of their work and backup them regularly.

  ii. A lot of people have access to a ftp server with a registered
domain and unlimited storage (it's cheap enough). However, there's no
way you can have a generic ISP to set up a, say, SVN server listening.
So being able to commit to a ``plain'' filesystem, without a daemon
listening on the other side of the connection is perfect for them.

  iii. A distributed system is perfect also for people other than
developers: if it has a friendly user interface, you can teach them to
commit their changes done to documents to a RCS, so they can revert
their work and start again from a save-point if something went wrong.
For example, students may keep track of different versions of the
reports they write for school, and merge changes independently. Another
example is that of system administrators that prepare quick-and-dirty
layouts of configuration directories, like /etc.

  iv. In critical environment integrity of saved data is essential. If
you have your hard disk damaged, for example, and all the repository is
stored into a unique binary file, there might be chances you end losing
the whole history of changes.

  v. Developers into small or medium-sized projects need to have a
simple way to merge changes between them, minimizing conflicts. If
everyone commits to the same repository, they will have to deal with
conflicts a lot often, making difficult to find and revert a particular
commit if it introduced a bug.

  vi. There are some projects in which there's only one or a small
number of ``gatekeepers'' that can commit to the rep. ``Wine'' comes to
mind. In this scenario, people may want to start their own branch, and
do a lot of small atomic commits to them. Then they may want to replay
them, grouping them by functionality added, and then build a `set of
changesets'. The next step would be to send it to the maintainer of the
``official'' repository, for merging. So, what they need is a simple way
to update their working copy with changes from mainstream, and leave
these patches out of the changesets they're going to send to the lead
maintainer, in order to avoid duplication.

  vii. Huge projects work by a lot of branching. Having people not
getting in one anothers' way by _forcing_ them to branch isn't bad.
Moreover, only _good_ changes would be merged into the ``official''
archive, thus (hopefully) leaving out badly written code, or incomplete
features. For example, give a look at http://cvs.gnome.org/viewcvs/.
Half of those modules have never seen the light. There are modules like
``anjuta2'', which has a deceptive name since the real development for
anjuta2 is being done on anjuta-HEAD. Keeping incomplete/bad
contributions around just makes more difficult to other people to
understand what's happening and concurs in greatly waste space.

 viii. Developers of large projects may choose Arch 2.0 because it is a
tool that helps them in their work, instead of being something that
constrains the way they program. Unfortunately, this happens with a lot
RCSs, expecially centralized ones: you cannot branch without admin's
permission, you cannot commit to certain modules, sometimes the file you
need to commit to is locked by someone else, and so on...

Thus, I think there are at least three categories of people that may be
interested in a DRCS like Arch 2.0:
 1. Non-developers who need a way to easily store revisions of documents
and other (sensible?) data
 2. Developers and non-developers who work on small projects still in
embrional stage and not ready to be made public
 3. Developers of medium-to-large projects that could benefit from a
distributed development model in which only a handful of started
branches is then merged back to mainline

> 
> 2. What are the needs of that target market and how will Arch 2.0 win
> for them?
> 

Some random remarks:
 i. What to keep from GNU Arch 1.x?
    a) being able to publish archives without a daemon listening on the
other side, and accessing resources in a ``VFS-fashion'', maybe
extending it further
    b) a good merging system. After having tried branching and merging
in svn (one of the worse systems I've had occasion to work with), I
really started appreciating arch's naming scheme and way to solve
conflicts. SVN makes difficult to merge mostly due to a ugly user
interface, but also its standard way of solving conflicts makes it easy
for you to make a mistake.
    c) signing of archives

 ii. What to improve starting from GNU Arch 1.x?
    b) tla commands are a lot, long and quite difficult to remember (for
example: "make-archive" and "archive-mirror" are very descriptive, but I
think it would be easier to remember if they were uniformed: like
"make-archive" and "mirror-archive", or "archive-make" and
"archive-mirror"); a complex UI increases the entry barrier for
non-developers or people in a hurry -- and if you aim for the corporate
world this is not a secondary matter
    c) external module configuration could be worked on to be more user
friendly
    d) probably a minor issue, but I always wondered why, when
publishing to an FTP site, username and password couldn't be asked
interactively when not provided on the command line, instead that having
always to be passed as a part of the URI -- with obvious problems for
whoever have a username containing a char like "@".

  iii. Things new / things particularly important
     a) low bandwidth usage when updating / getting a snapshot of the
repository. You may think that almost everybody has ADSL nowadays, and
it may even be true in the U.S.; but without citing third-world
countries, you can simply come to Italy -- a fairly rich nation -- and
see how many people still surf the web with a 56kbps modem.
     b) a plugin system; maybe Arch 2 should become little more than a
specification and a framework, in the sense that gstreamer is; there
could be plugins for a pletora of things, ranging from merging
algorithms to supported media / transport protocols; and maybe even a
plugin for special treatment of certain mime-types -- for example,
imagine a .odt OpenOffice.org document: it's mostly a bunch of XML files
compressed in a unique archive. Instead of doing a binary diff of it,
knowing its ``semantic'' we could diff just its plain-text contents
     c) using XML for manifest files and to store other Arch metadata
could make it easier to extend an archive format over time without
breaking compatibility, and could probably help also when ``web
viewers'' of the repository are developed (which is a must-have by now)
     d) I know that C gives higher portability, but personally I think
that an object oriented approach could help engineering and maintaining
Arch 2 over time. IMHO, nowadays C++ is quite portable, if you don't go
``too exotic'' with templates and such :-) -- and it's fast.
     e) it should be possible (not compulsory) to include the full
history of a branch into a repository when merging; or else it would be
difficult to revert a particular changeset at some time in the future,
once the merging has taken place. This can be important if you exploit
Arch decentralization idea fully and you have a ``tree'' structure of
contributors. So, say that instead of:

  +-------------------------+
  |       FOOBAR(tm)        |
  | ``Official'' repository |
  +-------------------------+
   ^       ^       ^       ^
   |       |       |       |
  Adam    Bob     Clara   Demetrio

you've got for example:

  +-------------------------+
  |        FOOBAR(tm)       |
  | ``Official'' repository |
  +-------------------------+
   ^               ^
   |               |
  LNHC <--.       HKL <-----.
   ^      |        ^        |
   |      |        |        |
  Adam   Bob     Clara   Demetrio

Now say that Adam added a feature to product FooBar(tm), that
HongKongLuna doesn't want. So HKL tells Demetrio to revert it.
However, LuNoHoCo has implemented twenty new features at once, thanks
from the combined work of Adam and Bob, and these were merged into the
official repository just before a new FooBar(tm) version release.
So, how can Demetrio revert the commit with the merging of the unwanted
feature, without asking LNHC?

  iv. Minor things / maybe-not-so-useful items:
     a) say you've a repository that is a ``library'' of code: you and
your mates have committed a lot of prototypes to it. You may want to
find a particular function or keyword inside it. Although grepping is
always possible, giving people a way to associate keywords/a semantic to
files could be interesting: e.g. you may want to search the whole KDE
repository for ``functions written in C++ that have to do with network
sockets''.
This feature seems to be much-hyped in these days where there's a lot of
talk going on about things like Google Desktop Search, Apple's Spotlight
and Beagle.
     b) a way to automatically advertising changes. Probably this could
be achieved by a third-party plugin. Think about Arch 2 offering a RSS
feed of commit logs when asked, or feeding ``dot'' a graph showing how
known branches relate. Although the actual changes wouldn't be
automatically committed to a central archive, it could be interesting
for, e.g., a company to keep an eye on what employees are working on at
the moment. This lower the possibility of duplicate work. I know there's
nothing better than ``human communication'' for this, for example on a
ML, but a way to batch this, if so desired, could be a useful extension.
     c) a GUI isn't necessarily a bad thing. I know quite a lot of users
that prefer to use Subversion over other alternatives because it has
TortoiseSVN, or they love Cervisia for CVS, and so on. I ain't saying it
should be the core of the application, but Arch2 should be thought out
in a way that makes reasonably easy to write a GUI on top of it.
    d) no problem with filenames beginning with ",," or "++", but
filenames beginning with "=" screw up my bash-completion thingie
    e) is there a way to conciliate p2p technologies with a RCS? Does it
make sense? Mmmh... probably not :-). Although if you've a large number
of people updating their snapshots at the same time, like it may happen
with the linux kernel sources, something bittorrent-like wouldn't be
totally unjustified.

Sorry if it was difficult to read... English isn't my mothertongue and I
just studied it on schoolbooks. :-|

Cheers,
- --
Matteo Settenvini
FSF Associated Member
Email : [EMAIL PROTECTED]

- -----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d--(-) s+:- a-- C++ UL+++
P?>++ L+++>$ E+>+++ W+++ N++ o?
w--- O- M++ PS++ PE- Y+>++
PGP+++ t+ 5 X- R tv-- b+++ DI+
D++ G++ e h+ r-- y?
- ------END GEEK CODE BLOCK------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFESUYLUDehq0srSdYRApt1AJwLL2VY3LJ23oIUatQ8ezVWbxWTLACgjLKt
Pwf+RM93odN07vACkn4Vcso=
=JQVh
-----END PGP SIGNATURE-----

_______________________________________________
Gnu-arch-users mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnu-arch-users

GNU arch home page:
http://savannah.gnu.org/projects/gnu-arch/

Re: [Gnu-arch-users] 2.0 fill in the blanks

Reply via email to