Re: [racket-dev] proposal for moving to packages: repository

2013-05-29 Thread Eli Barzilay
On Friday, Matthew Flatt wrote:
 At Fri, 24 May 2013 12:44:35 -0400, Eli Barzilay wrote:
* The script should also take care to deal with files that got
  removed in the past.
   
   Ditto.
  
  I don't believe that it's *not* doing this, so I did the
  double-check in the form of a test.
 
 You're right --- I misunderstood your example.

(BTW, in case it wasn't obvious -- that was a typo, since the script
is not doing it...)


 Still, I'm happy enough with the result in your example. The
 conversion does preserve `git log --follow' results for the files
 that survive, which was my intended spec. And maybe it's better to
 explain my interest as `git blame', since my main interest in the
 history of a file is often how/why a particular bit of code ended up
 as it is.

Ah, yes -- in that case, I think that it's doing that (= maintaining
the blame information) fine, but there are still things that you'll
want to keep.  (At least for some value of you...)

(I'll get back to this later, since it's the main content.)


* filter-branch one time using your script to reorganize the files
  according to packages
* use filter-branch with a subdirectory filter 5 times to create
  each repository
Total runtime: about 21 hours
  
  This latter use would end up with the final tree being exactly the
  same (since you're talking about doing the reorganization within
  git), but the history would be different since it's as if the
  files were there the whole time.
 
 I don't see how that works. Since my script leaves each file in its
 original location for old commits, I expect a subdirectory
 `filter-branch' to still drop history for the old locations. In any
 case, I'm happy to sort out that detail later.

Ah yes, keeping the files in-place instead of shuffling them around is
definitely much better.  And yes, it means that it *will* take that
large chunk of time for each extracted repository, but I think that
it's definitely worth the effort.  (Once there is a good way to do the
whole trimming thing, I can easily script that onto a bunch of lab
machines to do it all in parallel.)


 If we agree that `git mv' before splitting is practical, though,
 that's all I need for now.

Yes -- with all of the above, and with the additional improvements
that I'll suggest below.  Actually, I'll just send that in a new email
since it's long enough.


 From my perspective, the important thing is to have the ability to
 just edit and move files around to sort out packages, instead of
 having the indirection of a script that edits and moves files
 around.

OK -- but I still think that it's worth it to save a second major
change for people, and given that you've started with a suggestion for
package splitting, maybe just go on with revising that for a short
time and then just do the splitting without an intermediate period?
For people who want to keep dealing with the whole tree, the layout
is going to be the same so there won't be much difference anyway, and
people who want to deal with just their corner will get more time to
adjust and enjoy the benefits of dealing with just their corner
quicker.

BTW, it will potentially lead to more problems where my change to my
own repo goes fine and I don't know that it got broke because of a
change elsewhere since I didn't keep the other files in git form --
but this makes me think that the next release might be prone to such
issues, so it's better to start earlier with the segregation rather
than doing it later.  (But OTOH, the builds and drdr will keep a high
level of problem prevnetion, I hope.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-29 Thread Eli Barzilay
Now for the problems that are likely worth paying attention to, and
suggestions for improving things...

The quick summary of what I'm going to say is that I think that
there's a significant improvement that can be done with some more
work, one that requires some minimal manual intervention.  Because of
this, I think that it's best to work with a whole repository database
of file movements, which will be made automatically, but revise-able
manually to fix things.  Your scripts will change to parse this file
instead of running git directly, but since the format will be uniform,
this should be easy to adjust.

And a point of clarification: as you noted, these problems are not
things that you'll see in blames now.  For example, cases of
misidentification are in many places obviously nonsense, and real
cases are rare.  Another example is if there's a commit that removed a
bunch of code that you want to go over: currently, you'll see the
commit that removed a file in your history and the removed file is
visible in that commit but it won't be if it's truncated away.

I'll repeat here that I'm personally fine with not doing any of this,
but I think that most people do care about losing these bits.  Also,
note that some of these problems are likely to go away in some future
git (for example, search for fractions in the below problems to see
a feature that git doesn't have now but might improve in the future),
so an improved future blame will actually produce better output when
things are fixed manually even though currently the result won't
differ as much with these fixes.


A good starting point for the whole-repo database of file movements
is:

  git log --date-order --format='%n%h %ai %s' \
  --name-status -M -C --find-copies-harder -l2 -B

For reference, I've put this output here:

  http://tmp.barzilay.org/git-log.txt

I'm thinking of starting with this text, and manually fixing things
like removing bogus copies/moves, and adding ones that git missed.  In
addition, there should be some enrichment to the format, to specify
where deleted files go -- so it's possibl to go over removed files
(everything that starts with D) and assign them to package repos.
(Many of them are easy to do since their destination package is
obvious.)

The following is a list of problem examples, which can be addressed as
above.


Here is a problem where some potentially useful history is lost:

2a94ca9 Eric Dobson (3 weeks ago) Cleanup tc-lambda-unit.
  M collects/typed-racket/typecheck/tc-lambda-unit.rkt
  D collects/typed-racket/typecheck/parse-cl.rkt

c25ed74 Stephen Bloch (7 weeks ago) Moved error-message tests into a module+ in 
main source file.
  M collects/picturing-programs/private/tiles.rkt
  D collects/picturing-programs/tests/tiles-error-tests.rkt

1838953 Vincent St-Amour (5 months ago) Move define-inline to 
racket/performance-hint.
  M collects/scribblings/reference/syntax.scrbl
  D collects/unstable/scribblings/inline.scrbl

= In these cases, the second file got cleaned up into the first, but
   git considers them unrelated by default, so the history of the first
   is lost if it is not kept explicitly.

9f337c6 Jay McCarthy (10 weeks ago) Removing the planet2 name from the code
  A collects/tests/pkg/tests-checksums.rkt
  A collects/tests/pkg/tests-conflicts.rkt
  A collects/tests/pkg/tests-deps.rkt
  D collects/tests/planet2/tests-checksums.rkt
  D collects/tests/planet2/tests-conflicts.rkt
  D collects/tests/planet2/tests-deps.rkt
  ... lots of these ...

= In these cases files got renamed with enough changes to a point where
   git misses the fact that they were renamed.  (BTW, for this reason I
   recommended that renames are done without other modifications, and
   instead do them in a separate commit.)

It might help the above to lower the similarity threshold, but the first
problem is that git measures changes in relation to the overall file
size, so if the second file is big enough, it will not help.  Also,
there are these problems:

198a65a Matthew Flatt (13 days ago) raco pkg create: support source and 
binary bundling
  C100  collects/launcher/shcollects/tests/pkg/test-pkgs/pkg-x/nobin-top.txt
  ...

6c1e163 Matthew Flatt (1 year, 2 months ago) add missing jfp.css
  C100  collects/launcher/shcollects/scribble/jfp/jfp.css

= Empty files are an obvious problem here, since they are 100% similar,
   and therefore considered a copy of some random empty file.  Cannot
   just ignore empty files, since it happens in other files too:

fae660b Jay McCarthy (7 months ago) Release Planet 2 (beta)
  C056  collects/meta/drdr2/analyzer/analyzer.rkt   
collects/tests/planet2/test-pkgs/planet2-test1-conflict/planet2-test1/conflict.rkt
  ... many more ...

b2b5875 Blake Johnson (2 years, 7 months ago) replacing self modidx refs and 
tests
  C092  collects/meta/drdr2/analyzer/analyzer.rkt   
collects/tests/compiler/demodularizer/tests/racket-5.rkt

= 

Re: [racket-dev] proposal for moving to packages: repository

2013-05-24 Thread Eli Barzilay
Yesterday, Robby Findler wrote:
 Hi Eli: I'm trying to understand your point. Do I have this right?
 
 Background: The git history consists of a series checkpoints in time
 of the entire repository, not a collection of individual files.

Yes, although the difference between entire repository and
individual files is mostly theoretical.  The main point is that the
log history is made from changes to *content* -- you can't have some
made up history planter artificially for a file.  (And it is the same
for most CMSs if not all; the main difference here is that git doesn't
keep meta information about copying and renaming.)


 So, when I do git log x.rkt then what I get is essentially a
 filtered list (except where people didn't properly rebase, but lets
 ignore that) of those checkpoints: all the ones where x.rkt
 changed.

Exactly.  (I don't get the rebase comment though -- even without
rebasing what you get is this filtered history.)


 Big Question: The issue is, then, when we split up the current repo
 into smaller repos, what are the series of checkpoints that we're
 going to make up for the individual repos? Right? 

Yes, but it can get a bit subtle.  Like I said, the `filter-branch'
tool is basically replaying the entire history, giving you points to
inject hooks that can modify the tree or the commits, etc.  Note that
in all uses that were mentioned, there was a --prune-empty flag,
which means that commits that didn't have any change are dropped.  I'm
mentioning this because some people might have an illusion that it's
better to *not* do that and keep these commits.  Here's an example why
this is not useful: say that you have this edit sequence:

  foo@somewhere creates A/x, with log message created x
  bar@somewhere edits it, with log message edited x
  baz@somewhere moved it into B/x, with log renamed A to B

If at this point you use any git tools, they can see the real history.
For exmaple, you can use `blame' to see which lines were written by
which user.  Also, assuming that these are all the changes, a git log
will show the three commits as they appear above.

Now, if you you use filter-branch to modify the repository and keep
only the B directory, but you *don't* use the --prune-empty flag:
the fact that you want to keep these other commits won't help -- the
full history would have the three commits with the same three
messages, but doing a log for just the file would show only the
commits for the file, so the first two commits won't be shown.
Similarly, blame can't show anything useful -- you'll only see
baz@somewhere as the author of the entire file.  And the reason this
makes sense is that the full commit history has the first two commits,
but they had no change -- so there's nothing that ties them to the
file in the trimmed repository, let alone something that relates them
to specific lines in the file...

(Two notes: (a) This is just a demonstration -- obviously, this is a
trimming that is done in a bad way since it dropped A even though it's
part of the history of B.  (B) Actually, it looks like the
--subdirectory-filter drops empty commits anyway, but the above
explains why it makes sense to do that.)


 Your Advice: And, IIUC, you're suggesting that the best way to deal
 with this question is to defer it until we are more sure of the
 actual split we want to make. So we don't mess with the history at
 all

The point is that every such messing-with-history should be done very
carefuly and checked thoroughly, since the chance to mess things up is
very real.  In the above, it's obvious that I should have not droped A
in the filter -- but if it's some random single file which you had in
the framework collection, out of tons of other files in the drracket
package, then it's unlikely that I will catch it -- which is why I
prefer using tools for these things and resolve all such issues with
the people who know about the code.


 and instead just work at the level of some script that we can run to
 just use mv and company to move things around.  When we know
 exactly what ends up going where, then we can figure out how to make
 up a new, useful history for the separate repositories.
 
 Is that the point?

The thing is that having two such filters (one to restructure the big
repository and one to split it) is both increasing chances for making
mistakes, and making the job of the second restructure much harder to
do.  To the point where doing it manually is infeasible, which is why
I said that it will guarantee losing history.

(And I'll reply to Matthew's suggested tool next.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-24 Thread Eli Barzilay
8 hours ago, Matthew Flatt wrote:
 At Thu, 23 May 2013 07:09:17 -0400, Eli Barzilay wrote:
  Relevant history is vague.
 
 The history I want corresponds to `git log --follow' on each of the
 files that end up in a repository.

(In this context this is clear; the problem in Carl's post is that it
seemed like he was suggesting keeping the whole repository and doing
the split by removing material from clones -- which is and even fuller
history, but one that has large parts that are irrelevant.)


 That's true if you use `git filter-branch' in a particular way. I'll
 suggest an alternative way, which involves filtering the set of
 files in a commit-specific way. That is, the right set of files to
 keep for each commit are not the ones in the final place, but the
 ones whose history we need at each commit.

If that can be done reliabely, then of course it makes it possible to
do the split reliabley after the first restructure.  It does come with
a set of issues though...

 [... scripts description ...]

Here are a bunch of things that I thought about as I went over this.
In no particular order, probably not exhaustive, and possibly
repetitive:

* Minor: better to use `find-executable-path' since it's common to
  find systems (like mine) with an antique git in /usr/bin and a
  modern one elsewhere.  (In my case, both scripts failed since
  /usr/bin has an antique version.)

* There is an important point of fragility here: you're relying on git
  to be able to find all of the relevant file movements (renames and
  copies), which might not always be correct.  On one hand, you don't
  want to miss these operations, and on the other you don't want to
  have a low-enough threshold to identify bogus copies and renames.

* Because of this, I think that it's really best to inspect the
  results manually.  The danger of bogus copies, for example, is real,
  especially with small and very boilerplate-ish files like info.rkt
  files.  If there's a mistaken identification of such a copy you can
  end up with a bogus directory kept in the trimmed repo.  In
  addition, consider this information that the script detects via git
  for a specific commit:

A/f1.ss renamed to B/f1.rkt
A/f2.ss renamed to B/f2.rkt
...
A/f47.ss renamed to B/f47.rkt
A/f48.ss renamed to B/f48.rkt
A/f49.ss deleted
A/f50.ss deleted
B/f49.rkt created
B/f49.rkt created

  For a human reviewer, it's pretty clear that this is just a
  misidentification of two more moves (likely to happen with the kind
  of restructures that we did in the past, where a single commit both
  moves a file, and changes its contents).  This is why on one hand I
  *really* like to use such scripts (to make sure that I don't miss
  such things), but OTOH I want to review the analysis results to see
  potential problems and either fix them manually or figure out a way
  to improve the analysis and run it again.

* Also, I'd worry about file movements on top of paths that existed
  under a different final path at some point, and exactly situations
  like you described, where a file was left behind, but that file is
  completely new and should be considered separate (as in the case of
  a file move and a stub created in its place).

* The script should also take care to deal with files that got removed
  in the past.  For example, the drscheme collection has some file
  which gets removed, and later (completely unrelated) most of the
  contents migrated to drracket.  If the result of the analysis is
  that most of the material moved this way, and because of that you
  decide to keep the old drscheme collection -- you'd also want to
  keep that file that disappeared before the move, since it's part of
  the relevant history.

  So I'd modify this script to run on the *complete* repository -- the
  whole tree and all commits -- and generate information about
  movements.  Possibly do what your script is does for the whole tree,
  then add a second step that runs and looks for such files that are
  unaccounted for in the results, and decide what to do with them.

  I think that this also means that it makes sense to create a global
  database of all file movements in a single scan, instead of running
  it for each package.

* Technical: I thought that it might make sense to use a racket server
  (with netcat for the actual command), or have it compile a /bin/sh
  script to do the actual work instead of using `racket/kernel' for
  speed.  However, when I tried it on the plt tree, it started with
  spitting out new commits rapidly, but eventually slowed down to more
  than a second between commits, so probably even the kernel trick is
  not helping much...

* Actually, given the huge amount of time it's running (see next
  bullet), it's probably best to make it do the movements from all
  paths at the same time.  In this specific context, this means that
  it scans the package-restructured repo (from the first step) into a
  package-restructured repo 

Re: [racket-dev] proposal for moving to packages: binary vs source

2013-05-24 Thread Eli Barzilay
[Note subject change...]

Two days ago, Eric Dobson wrote:
 For binary vs source, I think you are providing a good argument for
 the usefulness of a no source distribution. Some people want to use
 tools written in Racket, and the fact that the tools are written in
 Racket is immaterial to them. They should be able to have just the
 binary versions.

There have been a bunch of concerns expressed about the question of
distributing sources or not -- but I think that generally speaking,
there shouldn't be any problems at all.  Here's a list of things that
contribute to not having such concerns:

1. The eventual goal would be to have very easy selection of packages
   that you want to install.  Either with (a) a bunch of installers,
   (b) possibly doing this by just a different URL that will have the
   installers listed in it as arguments, or (c) with a post-install
   dialog that will ask you for additional packages to install.  (In
   the (c) case, it could also detect packages that you had decided to
   install previously, and re-use the same list.)

   The bottom line is that if *you* want to get the sources, then it
   should be extremely easy to just have them installed (c), or create
   installers that include the sources (a;b) which you'll use.  The
   main point here is that using packages will make such variations
   very easy to implement, and make it easy for you to add sources or
   provide popular options based on demand.

2. With the geiser/drracket concern about reduced functionality
   because there are no sources: the information about the source of
   bindings is still there.  (Ie, things work fine if you remove a
   random source file from a current installation -- the only
   difference is that the actual source file is not there.)

   Now, I'm assuming that there is some way with the package system to
   know for any given file which package it came from.  With this
   information, I think that it would be easy to do something like
   this:

 * In drr, if you try to jump to a definition for a function whose
   source is not included, you get a popup telling you that you
   don't have the source, and list an on-line URL where the source
   can be found (which is inferrable from the package information)
   as well as a one-button-click option to install the source and
   then open the file.

 * Geiser could do exactly the same, and also use something like
   `url-handler-mode' to visit the source file directly from the
   on-line source in addition to offering to install the sources.

3. I think that there should be an option for package owners to decide
   how their package gets installed, so for example, if realm must be
   distributed with its sources, it can just specify that and avoid
   the stripping that other packages would go through.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-24 Thread Matthew Flatt
At Fri, 24 May 2013 03:26:45 -0400, Eli Barzilay wrote:
 If that can be done reliabely, then of course it makes it possible to
 do the split reliabley after the first restructure.

Great! Let's do that, because I remain convinced that it's going to be
a lot easier.


 * Also, I'd worry about file movements on top of paths that existed
   under a different final path at some point

I believe the file-lifetime computation in slice.rkt takes care of that.

 * The script should also take care to deal with files that got removed
   in the past.

Ditto.

 * Actually, given the huge amount of time it's running (see next
   bullet), it's probably best to make it do the movements from all
   paths at the same time.

There's no need to move anything while extracting a repository slice;
the movements happen before.

 * It's not clear to me what you want to do at this point, [...]
   Alternatively, do the first restructure with in-repo moves instead,

Yes, that's what I suggested.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-24 Thread Eli Barzilay
Four hours ago, Matthew Flatt wrote:
 At Fri, 24 May 2013 03:26:45 -0400, Eli Barzilay wrote:
  If that can be done reliabely, then of course it makes it possible to
  do the split reliabley after the first restructure.
 
 Great! Let's do that, because I remain convinced that it's going to
 be a lot easier.

I'm really surprised.  Given that you consider this a *lot* easier,
and that I consider it (reorganization + split) a lot messier, I think
that I'm still not getting something.


  * Also, I'd worry about file movements on top of paths that
existed under a different final path at some point
 
 I believe the file-lifetime computation in slice.rkt takes care of
 that.

That's what it looks like, but I'd double-check to make sure that it
happens.


  * The script should also take care to deal with files that got
removed in the past.
 
 Ditto.

I don't believe that it's *not* doing this, so I did the double-check
in the form of a test.  When I run it, I see these bad things (which I
expected to happen, so wrote it as a test):

* The c file got completely lost (this is the pre-reorganization
  file deletion scenario)

* The b file got lost too (post-reorg deletion)

* The history of e during the A days got lost, since it was not
  recognized as a rename in the A-B move due to being edited too.

= The first two are things that a script can deal with doing some
   kind of scan like I mentioned (go over the full history of the full
   tree).

= The third one is something that requires human judgment *but* if
   the A/e historic file is considered as deleted, and if deleted
   files from the original directories are included with doing the
   above, then it should still be there in the rewritten repo.

Test file attached; probably need to do very little other than
adjusting the paths to the two racket scripts.



b
Description: Binary data


  * Actually, given the huge amount of time it's running (see next
bullet), it's probably best to make it do the movements from all
paths at the same time.
 
 There's no need to move anything while extracting a repository
 slice; the movements happen before.

What I'm saying is that if filter-branch using your script takes 20
hours, and you want to use it to split the repo to 5 packages, and if
a simple filter-branch with a subdirectory filter takes a few minutes,
then instead of:

  * filter-branch using your script 5 times to create each repository
  Total runtime: more than 4 days

you do this:

  * filter-branch one time using your script to reorganize the files
according to packages
  * use filter-branch with a subdirectory filter 5 times to create
each repository
  Total runtime: about 21 hours

This latter use would end up with the final tree being exactly the
same (since you're talking about doing the reorganization within git),
but the history would be different since it's as if the files were
there the whole time.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-24 Thread Matthew Flatt
At Fri, 24 May 2013 12:44:35 -0400, Eli Barzilay wrote:
   * The script should also take care to deal with files that got
 removed in the past.
  
  Ditto.
 
 I don't believe that it's *not* doing this, so I did the double-check
 in the form of a test. 

You're right --- I misunderstood your example.

Still, I'm happy enough with the result in your example. The conversion
does preserve `git log --follow' results for the files that survive,
which was my intended spec. And maybe it's better to explain my
interest as `git blame', since my main interest in the history of a
file is often how/why a particular bit of code ended up as it is.

 What I'm saying is that if filter-branch using your script takes 20
 hours

Just to confirm, my experiment on the main repo completed in right at
20 hours. (The `git log --follow's and `git blame's that I tried look
good to me.)

   * filter-branch one time using your script to reorganize the files
 according to packages
   * use filter-branch with a subdirectory filter 5 times to create
 each repository
   Total runtime: about 21 hours
 
 This latter use would end up with the final tree being exactly the
 same (since you're talking about doing the reorganization within git),
 but the history would be different since it's as if the files were
 there the whole time.

I don't see how that works. Since my script leaves each file in its
original location for old commits, I expect a subdirectory
`filter-branch' to still drop history for the old locations. In any
case, I'm happy to sort out that detail later.

If we agree that `git mv' before splitting is practical, though, that's
all I need for now.

From my perspective, the important thing is to have the ability to just
edit and move files around to sort out packages, instead of having the
indirection of a script that edits and moves files around.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Eli Barzilay
9 hours ago, Matthew Flatt wrote:
 At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote:
  That's true, but the downside of changing the structure and having
  files and directories move post structure change will completely
  destroy the relevant edit history of the files, since it will not
  be carried over to the repos once it's split.
 
 It's possible that we're talking past each other due to me not getting
 this point.

(Obligatory re-disclaimer: I consider the problem with forcing people
to change their working environment much more severe.)


 Why is it not possible to carry over history?
 
 The history I want corresponds to `git log --follow' on each of the
 files that end up in a repository. I'm pretty sure that such a
 history of commits can be generated for any given set of files, even
 if no ready-made tool exists already (i.e., 'git' is plenty flexible
 that I can script it myself).
 
 Or maybe I'm missing some larger reason?

The thing to remember is just how simple git is...  There's no magical
way to carry over a history artificially -- it's whatever is in the
commits.

To make this more concrete (and more verbose), in this context the
point is that git filter-branch is a simple tool that basically
replays the complete history, allowing you to plant various hooks to
change the directory structure, commit messages or whatever.  The new
history is whatever new commits are in the revised repository, with no
way to make up a history with anything else.

Now, to make my first point about the potential loss of history that
is inherent in the process -- say that you want to split out a
drracket repo in a naive way: taking just that one directory.  Since
it's done naively, the resulting repository will not have the
drscheme directory and its contents, which means that you lose all
history of files that happened there.  To try that (in a fresh clone,
of course) -- first, look at the history of a random file in it:

  F=collects/drracket/private/app.rkt
  git log --format='%n%h %s' --name-only --follow -- $F

Now do the revision:

  S=collects/drracket
  git filter-branch --prune-empty --subdirectory-filter $S -- --all

And look at the same log line again, the history is gone:

  git log --format='%n%h %s' --name-only --follow -- $F

If you look at the *new* file, you do see the history, but the
revisions made in drscheme are gone:

  git log --format='%n%h %s' --name-only --follow -- private/app.rkt

In any case, this danger is there no matter what, especially in our
case since code has been moving around in the racket switch.  I
*hope* that most of it will be simple: like carrying along the
drscheme directory with drracket, the scheme and mzlib with
racket, etc.  Later on, if these things move to compat packages,
the irrelevant directories get removed from the repo without
surgeries, so the history will still be there.  This shows some of the
tricks that might be involved in the current switch: if you'd want to
have some compat package *now*, the right thing to do would be:

  * do a simple filter-branch to extract drscheme (and other such
collections) in a new repository for compat

  * for drracket: do a filter-branch that keeps *both* directories
in, then commit a removal of drscheme.  (Optionally, use rebase
to move the deletion backward...)

Going back to the repo structure change that you want and the reason
that I said that doing moves between the package directories
post-restructure is destructive should be clear now: say that you move
collects/A/x into foo/A/x as part of the restructure.  Later you
realize that A/x should go into the bar package instead so you just
move it to bar/A/x.  The history is now in, including the rename, but
later on when bar is split into a separate repo, the history of the
file is gone.  Instead, it appears in the foo repository, ending up
being deleted.

One way to get around this is to avoid moving the file -- instead, do
another filter-branch surgery.  This will be a mess since each such
change will mean rebuilding the repository with all the pain that this
implies.  Another way to get around it is to keep track of these
moving commits, and when the time comes to split into package repos,
you first do another surgery on the whole repo which moves foo/A/x to
bar/A/x for all of the commits before the move (not after, since that
could lead to other problems), and then do the split.

This might work, but besides being very error-prone, it means doing
the same kind of file-movement tracking that I'm talking about anyway.
So take this all as saying that the movement of files between packages
needs to be tracked anyway -- but with my suggestion the movement is
delayed until it's known to be final before the repo split, which
makes it more robust overall.



But really, the much more tempting aspect for me is that this can be
done now -- if you give me a list of packages and files, I can already
do the movement script.

Actually, in an 

Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Eli Barzilay
9 hours ago, Carl Eastlund wrote:
 I was going to comment on the same thing.  While a naive use of git
 filter-branch might not retain the history, it should be entirely
 possible to do something a little more intelligent and keep that
 history.

Just to be clear, this is exactly what you can't get with
filter-branch.


 Essentially each of the new repositories could keep the entire
 history of the original repository, followed by a massive
 move/rename, then moving forward with an individual package.

This can work, but it is unrelated to filter-branch: it's basically
starting each package repository from a clone of the monolithic repo,
then move  shuffle things around.

This seems wrong to me in all kinds of ways -- but if someone wants to
do this with *their* package (ie, not a package that I need to deal
with), then it's certainly an option.

(That's one of the big appeals of moving to packages for me: some code
moves to packages which I can let myself Not Care About™.  Knock
youself out with tabs, spaces at ends of lines, braces in code, two
spaces between bindings and values in `let's, and make sure that no
file ends with a newline...)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Carl Eastlund
On Thu, May 23, 2013 at 5:49 AM, Eli Barzilay e...@barzilay.org wrote:

 9 hours ago, Carl Eastlund wrote:
  I was going to comment on the same thing.  While a naive use of git
  filter-branch might not retain the history, it should be entirely
  possible to do something a little more intelligent and keep that
  history.

 Just to be clear, this is exactly what you can't get with
 filter-branch.

  Essentially each of the new repositories could keep the entire
  history of the original repository, followed by a massive
  move/rename, then moving forward with an individual package.

 This can work, but it is unrelated to filter-branch: it's basically
 starting each package repository from a clone of the monolithic repo,
 then move  shuffle things around.

 This seems wrong to me in all kinds of ways -- but if someone wants to
 do this with *their* package (ie, not a package that I need to deal
 with), then it's certainly an option.


It doesn't seem wrong to me.  It's an accurate representation of the
history of the project, which is exactly what git is for retaining.  Where
does the problem come from?  If git filter-branch doesn't maintain the
history we need, it's not the right tool for the job.

--Carl
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Eli Barzilay
A few minutes ago, Carl Eastlund wrote:
 On Thu, May 23, 2013 at 5:49 AM, Eli Barzilay e...@barzilay.org wrote:
 
 9 hours ago, Carl Eastlund wrote:
  I was going to comment on the same thing.  While a naive use
  of git filter-branch might not retain the history, it should
  be entirely possible to do something a little more intelligent
  and keep that history.

 Just to be clear, this is exactly what you can't get with
 filter-branch.

  Essentially each of the new repositories could keep the entire
  history of the original repository, followed by a massive
  move/rename, then moving forward with an individual package.

 This can work, but it is unrelated to filter-branch: it's
 basically starting each package repository from a clone of the
 monolithic repo, then move  shuffle things around.

 This seems wrong to me in all kinds of ways -- but if someone
 wants to do this with *their* package (ie, not a package that I
 need to deal with), then it's certainly an option.
 
 It doesn't seem wrong to me.  It's an accurate representation of the
 history of the project, which is exactly what git is for retaining. 
 Where does the problem come from?

The problem of filter-branch?  It has no problems, it does exactly
what it is supposed to do.


 If git filter-branch doesn't maintain the history we need, it's not
 the right tool for the job.

If the drracket files are irrelevant for the swindle package then they
shouldn't be in the swindle repository -- and on the exact same token,
the development history of drracket shouldn't be there either.

(This is not new, BTW, I think that there was general concensus right
from the start of the package talk that the monolithic repo is just a
host for a bunch of separate projects.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Carl Eastlund
On Thu, May 23, 2013 at 6:57 AM, Eli Barzilay e...@barzilay.org wrote:

 A few minutes ago, Carl Eastlund wrote:
  On Thu, May 23, 2013 at 5:49 AM, Eli Barzilay e...@barzilay.org wrote:
 
  9 hours ago, Carl Eastlund wrote:
   I was going to comment on the same thing.  While a naive use
   of git filter-branch might not retain the history, it should
   be entirely possible to do something a little more intelligent
   and keep that history.
 
  Just to be clear, this is exactly what you can't get with
  filter-branch.
 
   Essentially each of the new repositories could keep the entire
   history of the original repository, followed by a massive
   move/rename, then moving forward with an individual package.
 
  This can work, but it is unrelated to filter-branch: it's
  basically starting each package repository from a clone of the
  monolithic repo, then move  shuffle things around.
 
  This seems wrong to me in all kinds of ways -- but if someone
  wants to do this with *their* package (ie, not a package that I
  need to deal with), then it's certainly an option.
 
  It doesn't seem wrong to me.  It's an accurate representation of the
  history of the project, which is exactly what git is for retaining.
  Where does the problem come from?

 The problem of filter-branch?  It has no problems, it does exactly
 what it is supposed to do.


It has no problems?  Where above you stated this is exactly what you
can't get with filter-branch in reference to keeping our packages'
relevant history.  That sounds like a problem to me, in our current context.

But filter-branch is not what I was talking about.  I was talking about
_not_ using filter-branch, and instead doing something that does keep
history.


   If git filter-branch doesn't maintain the history we need, it's not
  the right tool for the job.

 If the drracket files are irrelevant for the swindle package then they
 shouldn't be in the swindle repository -- and on the exact same token,
 the development history of drracket shouldn't be there either.

 (This is not new, BTW, I think that there was general concensus right
 from the start of the package talk that the monolithic repo is just a
 host for a bunch of separate projects.)


Okay, then let's purge the history of irrelevant files, but keep the
history of relevant files even if they weren't in the right directory.
If the monolithic repo is just a host for a bunch of separate projects,
shouldn't it be possible to tease out their more-or-less separate histories?

--Carl
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Eli Barzilay
Just now, Carl Eastlund wrote:
 On Thu, May 23, 2013 at 6:57 AM, Eli Barzilay e...@barzilay.org wrote:
 
 A few minutes ago, Carl Eastlund wrote:
 
  It doesn't seem wrong to me.  It's an accurate representation
  of the history of the project, which is exactly what git is
  for retaining.   Where does the problem come from?

 The problem of filter-branch?  It has no problems, it does
 exactly what it is supposed to do.
 
 It has no problems?  Where above you stated this is exactly what
 you can't get with filter-branch in reference to keeping our
 packages' relevant history.

Relevant history is vague.  The thing that you can't do with
filter-branch is keep the complete history if you remove files from
the history -- the files that are gone go with their history.


 But filter-branch is not what I was talking about.  I was talking
 about _not_ using filter-branch, and instead doing something that
 does keep history.

Like I said: what you're suggesting means keeping the full monolithic
history of developement in the main repo, including all of the
irrelevant files (which will be removed in the tip, but included in
the repo).

  If git filter-branch doesn't maintain the history we need, it's not
  the right tool for the job.

 If the drracket files are irrelevant for the swindle package then they
 shouldn't be in the swindle repository -- and on the exact same token,
 the development history of drracket shouldn't be there either.

 (This is not new, BTW, I think that there was general concensus right
 from the start of the package talk that the monolithic repo is just a
 host for a bunch of separate projects.)
 
 Okay, then let's purge the history of irrelevant files, but keep the
 history of relevant files even if they weren't in the right
 directory.  If the monolithic repo is just a host for a bunch of
 separate projects, shouldn't it be possible to tease out their
 more-or-less separate histories?

(*sigh*; please read the other email, where I went over this
thoroughly.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Carl Eastlund
On Thu, May 23, 2013 at 7:09 AM, Eli Barzilay e...@barzilay.org wrote:

 Just now, Carl Eastlund wrote:
  On Thu, May 23, 2013 at 6:57 AM, Eli Barzilay e...@barzilay.org wrote:
 
  A few minutes ago, Carl Eastlund wrote:
  
   It doesn't seem wrong to me.  It's an accurate representation
   of the history of the project, which is exactly what git is
   for retaining.   Where does the problem come from?
 
  The problem of filter-branch?  It has no problems, it does
  exactly what it is supposed to do.
 
  It has no problems?  Where above you stated this is exactly what
  you can't get with filter-branch in reference to keeping our
  packages' relevant history.

 Relevant history is vague.  The thing that you can't do with
 filter-branch is keep the complete history if you remove files from
 the history -- the files that are gone go with their history.


  But filter-branch is not what I was talking about.  I was talking
  about _not_ using filter-branch, and instead doing something that
  does keep history.

 Like I said: what you're suggesting means keeping the full monolithic
 history of developement in the main repo, including all of the
 irrelevant files (which will be removed in the tip, but included in
 the repo).

   If git filter-branch doesn't maintain the history we need, it's not
   the right tool for the job.
 
  If the drracket files are irrelevant for the swindle package then
 they
  shouldn't be in the swindle repository -- and on the exact same
 token,
  the development history of drracket shouldn't be there either.
 
  (This is not new, BTW, I think that there was general concensus right
  from the start of the package talk that the monolithic repo is just a
  host for a bunch of separate projects.)
 
  Okay, then let's purge the history of irrelevant files, but keep the
  history of relevant files even if they weren't in the right
  directory.  If the monolithic repo is just a host for a bunch of
  separate projects, shouldn't it be possible to tease out their
  more-or-less separate histories?

 (*sigh*; please read the other email, where I went over this
 thoroughly.)


I just went over all your emails on this topic, and I can't find a single
one where you addressed this specific proposal at all.  I don't know which
one of us is misunderstanding another on this point.

--Carl
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Robby Findler
Hi Eli: I'm trying to understand your point. Do I have this right?

Background: The git history consists of a series checkpoints in time of the
entire repository, not a collection of individual files. So, when I do git
log x.rkt then what I get is essentially a filtered list (except where
people didn't properly rebase, but lets ignore that) of those checkpoints:
all the ones where x.rkt changed.

Big Question: The issue is, then, when we split up the current repo into
smaller repos, what are the series of checkpoints that we're going to make
up for the individual repos? Right?

Your Advice: And, IIUC, you're suggesting that the best way to deal with
this question is to defer it until we are more sure of the actual split we
want to make. So we don't mess with the history at all and instead just
work at the level of some script that we can run to just use mv and
company to move things around. When we know exactly what ends up going
where, then we can figure out how to make up a new, useful history for the
separate repositories.

Is that the point?

Robby



On Thu, May 23, 2013 at 4:41 AM, Eli Barzilay e...@barzilay.org wrote:

 9 hours ago, Matthew Flatt wrote:
  At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote:
   That's true, but the downside of changing the structure and having
   files and directories move post structure change will completely
   destroy the relevant edit history of the files, since it will not
   be carried over to the repos once it's split.
 
  It's possible that we're talking past each other due to me not getting
  this point.

 (Obligatory re-disclaimer: I consider the problem with forcing people
 to change their working environment much more severe.)


  Why is it not possible to carry over history?
 
  The history I want corresponds to `git log --follow' on each of the
  files that end up in a repository. I'm pretty sure that such a
  history of commits can be generated for any given set of files, even
  if no ready-made tool exists already (i.e., 'git' is plenty flexible
  that I can script it myself).
 
  Or maybe I'm missing some larger reason?

 The thing to remember is just how simple git is...  There's no magical
 way to carry over a history artificially -- it's whatever is in the
 commits.

 To make this more concrete (and more verbose), in this context the
 point is that git filter-branch is a simple tool that basically
 replays the complete history, allowing you to plant various hooks to
 change the directory structure, commit messages or whatever.  The new
 history is whatever new commits are in the revised repository, with no
 way to make up a history with anything else.

 Now, to make my first point about the potential loss of history that
 is inherent in the process -- say that you want to split out a
 drracket repo in a naive way: taking just that one directory.  Since
 it's done naively, the resulting repository will not have the
 drscheme directory and its contents, which means that you lose all
 history of files that happened there.  To try that (in a fresh clone,
 of course) -- first, look at the history of a random file in it:

   F=collects/drracket/private/app.rkt
   git log --format='%n%h %s' --name-only --follow -- $F

 Now do the revision:

   S=collects/drracket
   git filter-branch --prune-empty --subdirectory-filter $S -- --all

 And look at the same log line again, the history is gone:

   git log --format='%n%h %s' --name-only --follow -- $F

 If you look at the *new* file, you do see the history, but the
 revisions made in drscheme are gone:

   git log --format='%n%h %s' --name-only --follow -- private/app.rkt

 In any case, this danger is there no matter what, especially in our
 case since code has been moving around in the racket switch.  I
 *hope* that most of it will be simple: like carrying along the
 drscheme directory with drracket, the scheme and mzlib with
 racket, etc.  Later on, if these things move to compat packages,
 the irrelevant directories get removed from the repo without
 surgeries, so the history will still be there.  This shows some of the
 tricks that might be involved in the current switch: if you'd want to
 have some compat package *now*, the right thing to do would be:

   * do a simple filter-branch to extract drscheme (and other such
 collections) in a new repository for compat

   * for drracket: do a filter-branch that keeps *both* directories
 in, then commit a removal of drscheme.  (Optionally, use rebase
 to move the deletion backward...)

 Going back to the repo structure change that you want and the reason
 that I said that doing moves between the package directories
 post-restructure is destructive should be clear now: say that you move
 collects/A/x into foo/A/x as part of the restructure.  Later you
 realize that A/x should go into the bar package instead so you just
 move it to bar/A/x.  The history is now in, including the rename, but
 later on when bar is split into a separate 

Re: [racket-dev] proposal for moving to packages: repository

2013-05-23 Thread Matthew Flatt
At Thu, 23 May 2013 07:09:17 -0400, Eli Barzilay wrote:
 Relevant history is vague.

The history I want corresponds to `git log --follow' on each of the
files that end up in a repository.

 The thing that you can't do with
 filter-branch is keep the complete history if you remove files from
 the history -- the files that are gone go with their history.

That's true if you use `git filter-branch' in a particular way. I'll
suggest an alternative way, which involves filtering the set of files
in a commit-specific way. That is, the right set of files to keep for
each commit are not the ones in the final place, but the ones whose
history we need at each commit.


To make sure I'm not confused, I've implemented this idea. My
implementation is unlikely to be exactly right, yet, but I think it
works as a proof of concept.


The enclosed slice.rkt script takes a subdirectory and a destination
directory. Run it in the top directory of a git repository, and it
finds all the files in the given subdirectory, and then it closes over
the history of each file via `git log --follow'.

From that point, we could use the computed set of paths as the ones to
keep during a `git filter-branch' on every commit, but that's not
ideal. For example, a file in collection a that is destined for
package a may have originated in b (think mzlib), where the
same-named file sticks around in b after the copy. It's nicer and
cleaner to have irrelevant files disappear after the relevant copy/move
is made.

So, I took one more step: slice.rkt constructs a range of commits
during which the file should exist, based on when it was moved or
copied. (Forks and merges are a minor obstacle, which the script works
around by enlarging ranges to hit commits in the `--first-parent'
traversal.) Conceptually, the result is a mapping from commit ids to
paths, but that would be a big table to read on every `filter-branch'
step, so it's reported as a table of commits with enter/leave
transitions. The output of slice.rkt is files: state.rktd for the
set of files to be kept in the initial commit, and actions.rktd to
specify the transitions.

The enclosed prune.rkt script works with `git filter-branch
--index-filter'. It uses actions.rktd (read-only) and state.rktd
(which it updates via transitions).


The Racket git repo is large, so I've only tried the `git
filter-branch' step so far on smaller repos, such as the iplt
repository. In my clone of iplt, I `git mv'ed web/internal to
ex/internal. Then, with the scripts in /tmp,

 racket /tmp/slice.rkt ex /tmp
 git filter-branch --index-filter racket /tmp/prune.rkt /tmp --prune-empty

leaves the repo with only the files of ex, and `git log --follow'
on various files looks right.

I'll try on a clone of the Racket repo and report back.

FWIW, before doing this for real, I'd want to add a `--msg-filter' that
extends each commit message to add the original commit id, since we
have references to the old ids in various places (and so it would be
handy to have them in the new repos).


slice.rkt
Description: Binary data


prune.rkt
Description: Binary data
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-22 Thread Eli Barzilay
Yesterday, Matthew Flatt wrote:
 We already have a system for constructing a script that can move
 files around and adjust content as needed: git.

The script that I'm talking about *would* be in the repository, of
course.  It will essentially become a replacement for the distribution
specs -- with the following differences:

  * Much less sophisticated, since it'll be just verbatim paths

  * Enforced via a package-aware build.

  * Easily translated into a git operation to split the monolithic
repo.

And with all of that, it is a truly gradual change -- allowing work on
the package front to proceed without disturbing anyone's work
environment until the repositories are physically split.


 As long as some of us are trying to write that script while others
 are changing the existing directories and files, there will be
 collisions.

That's true, but the downside of changing the structure and having
files and directories move post structure change will completely
destroy the relevant edit history of the files, since it will not be
carried over to the repos once it's split.

Meta-note: I'm not arguing this as something that I strongly care
about personally.  I'm fine with nuking the whole history and start
from fresh repositories post-split.  I'm just trying to make the
damage explicit for those who do care about keeping that history.

In addition, I'm trying to make the move to packages as painless as
possible for people -- your suggestion introduces three big changes:
(a) structure change, (b) packages, (c) repository+structure change;
and my suggestion eliminates (a), and a large part of (c) which will
be a byproduct of (a).  The reason that I think it makes more sense is
that it allows package-based builds to start as soon as possible (even
now, if the build is working with it), without waiting for anyone to
adapt anything.


 I want to minimize conflicts and maximize the number of people who
 can help refine the package structure.

The only point of loss that I see is the equivalence of the
check-dists as a test in drdr -- but even that is completely minor,
since drdr itself would also switch to package-based builds, and
therefore dependency problems would still get reported by drdr.

What other conflicts (ones that won't be detected by nightly or drdr
builds) do you see?


 I think a lot of people on this list are eager to contribute to the
 shift into packages. As someone close to the new structure, I'm
 telling you my best guess at how you can help and in be in a
 position to help more: let us switch the repo sooner rather of
 later.

As another meta-point: I'm probably at the top 2% of eagerness to
switch.  The current distribution thing is full of stuff that I would
be very happy to see gone; the package-level dependency problems are
things that I have been complaining about for years (and usually I'd
be the only one to do so, and get some weak support only after huge
emails trying to explain the future damage).  In addition to that,
back when the general direction was to keep the single repository as a
place for all of the main package sources I sighed at the prospect
of having the distribution-spec linger on as a specification of
package splitting -- and I preffered to move into a split-by-directory
structure to simplify things; so the move to separate repositories is
something that is way more appealing to me.  In short, I *very* much
want this to happen, and I want it to happen as soon as possible.

And this is exactly why I've made this suggestion: it allows an
immediate switch.  No need for any kind of convincing or discussion.
As long as people agree on the end result of splitting into
repositories, the package work continues as planned, unstoppable and
undelay-able by people who are not dealing with packages.

(And as a side note: even in the imaginary case that eventually
there's some anti-package or anti-repo-split revolution, nothing is
lost, since the result is still a better build + distribution
process.)

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-22 Thread Eli Barzilay
Yesterday, Eric Dobson wrote:
 On Tue, May 21, 2013 at 4:29 AM, Jay McCarthy jay.mccar...@gmail.com wrote:
  In my tree, I have 20M of compiled code and 13M of source. I like
  the idea of a reduction of about 50% in size of downloads.
 
 I'm not sure if something on the order of 10M is something to worry
 about optimizing, that takes like 5-6 seconds to download on a
 15Mbit connection. And a minute on a much slower connection.

I don't know how Jay got those numbers, but I have a very different
picture:

  363M Current installed tree
  278M No-source tree (with docs)
   56M Installed textual tree (has no docs and scrbl files)
   42M Same minus sources

If a package based installation is roughly like the textual thing, and
given that it's easy to extend it to a full installation by adding
packages, then we're talking about going from a 363M tree down to a
42M thing.  I think that the minimal core racket would be even smaller
than the textual thing: once I remove things that look like they
shouldn't be there, it goes down to 28M.

The impact of having a huge tree currently is pretty big, IMO.  One
example is that it is impractical to have random linux utilities
implemented in Racket if you need to drag in a 363M working
environment.  It's true that you could in theory use the textual
thing, but the monolithic tree makes it hard for linux distro
packagers to split things into a small core -- hard enough that nobody
did it so far.  Another example is the few brave people who tried to
make things work on small devices, which usually starts with a huge
effort to get rid of unnecessary stuff.

Finally -- consider J. Random User -- installing a 360M thing on your
computer is something that you'd worry about much more than a 28M
thing.  The smaller thing is at a point where you won't worry about it
beind left somewhere, and at a point where it's fine to installed as a
kind of a shared runtime thing for someone who wants to distribute
racket-based applications.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-22 Thread Matthew Flatt
At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote:
 That's true, but the downside of changing the structure and having
 files and directories move post structure change will completely
 destroy the relevant edit history of the files, since it will not be
 carried over to the repos once it's split.

It's possible that we're talking past each other due to me not getting
this point.

Why is it not possible to carry over history?

The history I want corresponds to `git log --follow' on each of the
files that end up in a repository. I'm pretty sure that such a history
of commits can be generated for any given set of files, even if no
ready-made tool exists already (i.e., 'git' is plenty flexible that I
can script it myself).

Or maybe I'm missing some larger reason?

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-22 Thread Carl Eastlund
On Wed, May 22, 2013 at 8:21 PM, Matthew Flatt mfl...@cs.utah.edu wrote:

 At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote:
  That's true, but the downside of changing the structure and having
  files and directories move post structure change will completely
  destroy the relevant edit history of the files, since it will not be
  carried over to the repos once it's split.

 It's possible that we're talking past each other due to me not getting
 this point.

 Why is it not possible to carry over history?

 The history I want corresponds to `git log --follow' on each of the
 files that end up in a repository. I'm pretty sure that such a history
 of commits can be generated for any given set of files, even if no
 ready-made tool exists already (i.e., 'git' is plenty flexible that I
 can script it myself).

 Or maybe I'm missing some larger reason?


I was going to comment on the same thing.  While a naive use of git
filter-branch might not retain the history, it should be entirely possible
to do something a little more intelligent and keep that history.
Essentially each of the new repositories could keep the entire history of
the original repository, followed by a massive move/rename, then moving
forward with an individual package.

--Carl
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-22 Thread Eric Dobson
I agree that 363 to 28 would be a great win. But you seem to be
describing the difference between Full Racket and core racket, not the
difference between binary and source.

For binary vs source, I think you are providing a good argument for
the usefulness of a no source distribution. Some people want to use
tools written in Racket, and the fact that the tools are written in
Racket is immaterial to them. They should be able to have just the
binary versions.



On Wed, May 22, 2013 at 12:30 PM, Eli Barzilay e...@barzilay.org wrote:
 Yesterday, Eric Dobson wrote:
 On Tue, May 21, 2013 at 4:29 AM, Jay McCarthy jay.mccar...@gmail.com wrote:
  In my tree, I have 20M of compiled code and 13M of source. I like
  the idea of a reduction of about 50% in size of downloads.

 I'm not sure if something on the order of 10M is something to worry
 about optimizing, that takes like 5-6 seconds to download on a
 15Mbit connection. And a minute on a much slower connection.

 I don't know how Jay got those numbers, but I have a very different
 picture:

   363M Current installed tree
   278M No-source tree (with docs)
56M Installed textual tree (has no docs and scrbl files)
42M Same minus sources

 If a package based installation is roughly like the textual thing, and
 given that it's easy to extend it to a full installation by adding
 packages, then we're talking about going from a 363M tree down to a
 42M thing.  I think that the minimal core racket would be even smaller
 than the textual thing: once I remove things that look like they
 shouldn't be there, it goes down to 28M.

 The impact of having a huge tree currently is pretty big, IMO.  One
 example is that it is impractical to have random linux utilities
 implemented in Racket if you need to drag in a 363M working
 environment.  It's true that you could in theory use the textual
 thing, but the monolithic tree makes it hard for linux distro
 packagers to split things into a small core -- hard enough that nobody
 did it so far.  Another example is the few brave people who tried to
 make things work on small devices, which usually starts with a huge
 effort to get rid of unnecessary stuff.

 Finally -- consider J. Random User -- installing a 360M thing on your
 computer is something that you'd worry about much more than a 28M
 thing.  The smaller thing is at a point where you won't worry about it
 beind left somewhere, and at a point where it's fine to installed as a
 kind of a shared runtime thing for someone who wants to distribute
 racket-based applications.

 --
   ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
 http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Antonio Menezes Leitao
I've been using using Racket (and DrRacket) to teach programming
to architecture students. These are not sophisticated users, so any
move that makes it more difficult for them to use Racket is not good
news.

What happened to the batteries included motto?

Just my 0.1 cents.

Best,
António.
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Sam Tobin-Hochstadt
On Mon, May 20, 2013 at 2:23 PM, Jose A. Ortega Ruiz j...@gnu.org wrote:

 Here's hope that down the line there'll be binary+source packages that
 end users can install with the same ease as today.

Matthew's email mentioned this a little, but the plan is that:

$ raco pkg install drracket

will install source as well as binaries.  The big change is that the
distribution you get from http://racket-lang.org/download/ won't
include all of that stuff.

Sam
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-21 Thread Sam Tobin-Hochstadt
On Mon, May 20, 2013 at 6:07 PM, Matthew Flatt mfl...@cs.utah.edu wrote:

 To put it another way and overstate a little: I'm trying to get buy-in
 from dev to make the switch to packages wholesale. The little bit of
 staging in the plan is to make the conversion itself easier, and not to
 simplify the switch for developers.

Can you spell out how the directory movement you described will make
the conversion easier?

Here's what I think the simplest move to multiple repositories would be:

1. Use `git filter-branch` to create a new repository for the
drracket package from the current git repository. [1]
2-N. Repeat step 1 for all the other packages we plan to split out.
N+1. Use `git rm` to remove everything that's been split out from the
main repository.

I think the key piece of information that makes this work is that `git
filter-branch` lets you do the subdirectory manipulation that you seem
to be planning to do manually. In particular, see the last example in
the `git filter-branch` man page [2], which is about moving things to
a subdirectory.

For example, here's the `realm` collect split out, using `git
filter-branch` twice: https://github.com/samth/realm-split

The commands I used are here: https://gist.github.com/samth/5618014

[1] 
https://help.github.com/articles/splitting-a-subpath-out-into-a-new-repository
[2] https://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html

Sam
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Carl Eastlund
On Mon, May 20, 2013 at 11:20 PM, Juan Francisco Cantero Hurtado 
i...@juanfra.info wrote:

 On 05/20/13 23:24, Carl Eastlund wrote:

 On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu
 wrote:

  On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote:

 Eventually, when the dust settles, I think we'll want to convert every
 directory to its own git repo, and then we can incorporate the
 individual repos as git submodules.


 One nice thing about the current repo organization is that push
 notifications for every part of the PLT codebase go to all of the
 developers.

 Will that still be available in this organization scheme? (I don't care
 if it's opt-in too much, but opt-out will hopefully mean more eyes see
 the code)

 Cheers,
 Asumu


 Overall, I'm really glad to see Racket moving into the package system.  I
 think it will be good for both (the Racket core and the package system).
 I'd like to mention, though, that git submodules can be a real pain for
 synchronizing development of multiple repositories.  They seem to have
 been
 designed primarily for importing upstream repositories, rather than for
 multiple peer repositories.  I'm not much more fond of the alternatives
 I
 have tried, either; if we're committing to splitting Racket into multiple
 repositories as well as multiple packages, we should be aware there may be
 another minor git learning curve ahead.

 Thanks to Jay and Matthew for working on all of this!


 I also think that git submodules are a bad idea for packages. One git repo
 per package is more simple and less problematic.

 Thanks for the hard work :)


Git submodules imply one repo per package.  A submodule is a mechanism that
imports external repos into a checkout of a client repo, and records the
specific commit of the checkout so that there is a correlation of the
commits in each repo stored with the client.  If we're going to use
multiple repositories, we definitely need something like submodules in
order to retain a shared commit history.

--Carl
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Jay McCarthy
On Tue, May 21, 2013 at 12:16 AM, Antonio Menezes Leitao
antonio.menezes.lei...@ist.utl.pt wrote:
 I've been using using Racket (and DrRacket) to teach programming
 to architecture students. These are not sophisticated users, so any
 move that makes it more difficult for them to use Racket is not good
 news.

 What happened to the batteries included motto?

The new organization does not imply that you can't download one thing
and get the core plus many packages. In fact, we intend to make it
more flexible so that teachers could easily create a distribution for
their class with the material they need (and not the stuff they
don't... like textbooks in German.)

Jay


 Just my 0.1 cents.

 Best,
 António.

 _
   Racket Developers list:
   http://lists.racket-lang.org/dev




--
Jay McCarthy j...@cs.byu.edu
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

The glory of God is Intelligence - DC 93

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Jay McCarthy
On Mon, May 20, 2013 at 10:05 PM, Eric Dobson eric.n.dob...@gmail.com wrote:
 I'm not sure I follow on why binary packages make it easier to reduce
 dependencies between packages, or why binary packages offer faster
 installs.

 I'm guessing that binary packages prevent cyclic dependencies between
 packages, but it seems like there are many other options that still
 get this side effect. Such as explicit checks when building the
 package.

If you have the source, then you need all the phase = 1 dependencies,
but if you just have the binary then you only need the phase = 0 deps.
Similarly, for building the documentation.

 For faster installs, the only benefit I see of binary packages over
 precompiled source packages is a small savings in size which doesn't
 seem like it would amount to much of the install time.

In my tree, I have 20M of compiled code and 13M of source. I like the
idea of a reduction of about 50% in size of downloads.

However, the faster install point is really about the fact that users
won't need to run raco setup and do the compilation/documentation
build once they do the download of the source.

Jay

 Can someone explain the claims for binary packages?

 On Mon, May 20, 2013 at 8:57 PM, Jon Zeppieri zeppi...@gmail.com wrote:
 On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote:
 [snip]

 Example: Imagine I'm in the middle of writing a Racket program and am
 wondering about characteristics of some kind of I/O port in Racket.  With
 transparent source accessibility, I can just click on an identifier in my
 program in DrRacket to start browsing the implementation.  Maybe I see a
 possible improvement, or seeing the source pre-empts yet another email list
 question that otherwise only Matthew could answer, or I feel empowered to go
 add a new feature.  If the source is not as accessible, then I'm more likely
 to be a mere naive user of the tools, rather than to understand the tools
 and help improve them.


 +inf.0

 Though the easiest way to make the source available is just to keep it
 in the distribution. I'll be sad to see it go.

 -Jon
 _
   Racket Developers list:
   http://lists.racket-lang.org/dev
 _
   Racket Developers list:
   http://lists.racket-lang.org/dev



--
Jay McCarthy j...@cs.byu.edu
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

The glory of God is Intelligence - DC 93
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Robby Findler
On Tue, May 21, 2013 at 6:22 AM, Jay McCarthy jay.mccar...@gmail.comwrote:

 On Tue, May 21, 2013 at 12:16 AM, Antonio Menezes Leitao
 antonio.menezes.lei...@ist.utl.pt wrote:
  I've been using using Racket (and DrRacket) to teach programming
  to architecture students. These are not sophisticated users, so any
  move that makes it more difficult for them to use Racket is not good
  news.
 
  What happened to the batteries included motto?

 The new organization does not imply that you can't download one thing
 and get the core plus many packages. In fact, we intend to make it
 more flexible so that teachers could easily create a distribution for
 their class with the material they need (and not the stuff they
 don't... like textbooks in German.)


I want to emphasize this point: there are no plans to change which
libraries are included when you download Racket. All of our crazy set of
batteries will still be included.

Robby
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-21 Thread Matthew Flatt
At Tue, 21 May 2013 00:09:49 -0700, Sam Tobin-Hochstadt wrote:
 On Mon, May 20, 2013 at 6:07 PM, Matthew Flatt mfl...@cs.utah.edu wrote:
 
  To put it another way and overstate a little: I'm trying to get buy-in
  from dev to make the switch to packages wholesale. The little bit of
  staging in the plan is to make the conversion itself easier, and not to
  simplify the switch for developers.
 
 Can you spell out how the directory movement you described will make
 the conversion easier?

I think we won't get an ideal package split on the first N tries, and
it will be easier to move files and directories around in one
repository (using `git mv') instead of among multiple repositories.
When we finally have mostly the right split, then we can use `git
filter-branch'.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Philippe Meunier
Jay McCarthy wrote:
If you have the source, then you need all the phase = 1 dependencies,
but if you just have the binary then you only need the phase = 0 deps.

That's assuming that you want to run the source, but I think that the
people who are arguing about still having the source available in the
distribution are mostly interested in reading the source, in which
case having only the source for the phase = 0 dependencies would
probably be a good enough approximation...

Philippe


_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread David Van Horn

On 5/20/13 4:42 PM, Matthew Flatt wrote:

I used to think that we'd take advantage of the package manager by
gradually pulling parts out of the Racket git repo and making them
packages.

Now, I think we should just shift directly to a small-ish Racket core,
making everything else a package immediately. Core means enough to
run `raco pkg'.

A key point to remember is that package does not mean omitted from
the distribution. Instead, we'll construct a distribution by
combining the core with a selected set of packages. Initially the
selected set of packages will cover everything in the current
distribution.

Jay and I have been lining up the pieces for this change (it's
difficult to make a meaningful proposal without trying a lot of the
work, first), and I provide a sketch of the overall plan below.

This plan has two prominent implications:

  * The current git repo's directory structure will change.


Will this directory structure change have an impact on how modules are 
referenced?


My biggest concern is the Realm of Racket book, which is about to come 
out.  It sounds like this change could potentially cause a lot of 
confusion if it alters the collects organization.


Thanks,
David


_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Eric Dobson
On Tue, May 21, 2013 at 4:29 AM, Jay McCarthy jay.mccar...@gmail.com wrote:
 On Mon, May 20, 2013 at 10:05 PM, Eric Dobson eric.n.dob...@gmail.com wrote:
 I'm not sure I follow on why binary packages make it easier to reduce
 dependencies between packages, or why binary packages offer faster
 installs.

 I'm guessing that binary packages prevent cyclic dependencies between
 packages, but it seems like there are many other options that still
 get this side effect. Such as explicit checks when building the
 package.

 If you have the source, then you need all the phase = 1 dependencies,
 but if you just have the binary then you only need the phase = 0 deps.
 Similarly, for building the documentation.

Like Philippe said a viewable source doesn't require this, only source
that can be compiled. Whether or not we want to support that I don't
know, but it seems like it should be possible.



 For faster installs, the only benefit I see of binary packages over
 precompiled source packages is a small savings in size which doesn't
 seem like it would amount to much of the install time.

 In my tree, I have 20M of compiled code and 13M of source. I like the
 idea of a reduction of about 50% in size of downloads.

I'm not sure if something on the order of 10M is something to worry
about optimizing, that takes like 5-6 seconds to download on a 15Mbit
connection. And a minute on a much slower connection.

 However, the faster install point is really about the fact that users
 won't need to run raco setup and do the compilation/documentation
 build once they do the download of the source.

Why would you need to run raco setup if the source was already
precompiled? Also how well does the source compress compared to
compiled code?


 Jay

 Can someone explain the claims for binary packages?

 On Mon, May 20, 2013 at 8:57 PM, Jon Zeppieri zeppi...@gmail.com wrote:
 On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org 
 wrote:
 [snip]

 Example: Imagine I'm in the middle of writing a Racket program and am
 wondering about characteristics of some kind of I/O port in Racket.  With
 transparent source accessibility, I can just click on an identifier in my
 program in DrRacket to start browsing the implementation.  Maybe I see a
 possible improvement, or seeing the source pre-empts yet another email list
 question that otherwise only Matthew could answer, or I feel empowered to 
 go
 add a new feature.  If the source is not as accessible, then I'm more 
 likely
 to be a mere naive user of the tools, rather than to understand the tools
 and help improve them.


 +inf.0

 Though the easiest way to make the source available is just to keep it
 in the distribution. I'll be sad to see it go.

 -Jon
 _
   Racket Developers list:
   http://lists.racket-lang.org/dev
 _
   Racket Developers list:
   http://lists.racket-lang.org/dev



 --
 Jay McCarthy j...@cs.byu.edu
 Assistant Professor / Brigham Young University
 http://faculty.cs.byu.edu/~jay

 The glory of God is Intelligence - DC 93
_
  Racket Developers list:
  http://lists.racket-lang.org/dev



Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Juan Francisco Cantero Hurtado

On 05/21/13 12:21, Carl Eastlund wrote:

On Mon, May 20, 2013 at 11:20 PM, Juan Francisco Cantero Hurtado 
i...@juanfra.info wrote:


On 05/20/13 23:24, Carl Eastlund wrote:


On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu
wrote:

  On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote:



Eventually, when the dust settles, I think we'll want to convert every
directory to its own git repo, and then we can incorporate the
individual repos as git submodules.



One nice thing about the current repo organization is that push
notifications for every part of the PLT codebase go to all of the
developers.

Will that still be available in this organization scheme? (I don't care
if it's opt-in too much, but opt-out will hopefully mean more eyes see
the code)

Cheers,
Asumu



Overall, I'm really glad to see Racket moving into the package system.  I
think it will be good for both (the Racket core and the package system).
I'd like to mention, though, that git submodules can be a real pain for
synchronizing development of multiple repositories.  They seem to have
been
designed primarily for importing upstream repositories, rather than for
multiple peer repositories.  I'm not much more fond of the alternatives
I
have tried, either; if we're committing to splitting Racket into multiple
repositories as well as multiple packages, we should be aware there may be
another minor git learning curve ahead.

Thanks to Jay and Matthew for working on all of this!



I also think that git submodules are a bad idea for packages. One git repo
per package is more simple and less problematic.

Thanks for the hard work :)



Git submodules imply one repo per package.  A submodule is a mechanism that
imports external repos into a checkout of a client repo, and records the
specific commit of the checkout so that there is a correlation of the
commits in each repo stored with the client.  If we're going to use
multiple repositories, we definitely need something like submodules in
order to retain a shared commit history.



You're right. I was thinking in git subtree. Sorry for the confusion.


_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Matthew Flatt
At Tue, 21 May 2013 10:46:29 -0400, David Van Horn wrote:
 On 5/20/13 4:42 PM, Matthew Flatt wrote:
  This plan has two prominent implications:
 
* The current git repo's directory structure will change.
 
 Will this directory structure change have an impact on how modules are 
 referenced?

The package system is designed to separate the way that modules are
referenced from the way that they are installed. Whether the module
`realm/chapter10/source' is part of the core, installed by the user as
a package, or included as an pre-installed package in a distribution, a
reference to the module within a program is always `(require
realm/chapter10/source)'.

A reference to the module of the form look in the 'collects'
directory's 'realm' subdirectory, however, would be broken by the
directory-structure change, and we'd have to do extra work to manage
that (such as keeping a note in the core or special-cased distributions
to point to the new path).

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-21 Thread Matthew Flatt
At Tue, 21 May 2013 05:29:19 -0600, Jay McCarthy wrote:
 If you have the source, then you need all the phase = 1 dependencies,
 but if you just have the binary then you only need the phase = 0 deps.

That's the right idea, but not precisely correct. If you `(require (for
syntax ...))' a module, then the module is still needed at run time,
because it might have a `(require (for-template ...))', and so on.

A modules referenced though `lazy-require' in a `for-syntax' import,
however, could conceivably be omitted. For example, a large part of the
Typed Racket compiler might be omitted as a run-time dependency for a
Typed Racket program. We're not quite to the place where that will work
out well, but I think we'll get there.

 Similarly, for building the documentation.

That's really the big one in the short run, I think. It's difficult to
have anything small and still have Racket-style documentation.


At Tue, 21 May 2013 08:10:02 -0700, Eric Dobson wrote:
 Why would you need to run raco setup if the source was already
 precompiled?

It's easy to underestimate the complexity of `raco setup'. Indeed, if
every `raco setup' started from scratch, it would be pretty easy.

Instead, `raco setup' has to perform an incremental computation based
on an inferred set of filesystem changes, where the computation to
incrementalize includes bytecode compilation, document rendering,
document database cross-referencing, path adjustments, and more --- and
it's all supposed to work in parallel, it's not supposed to leave
things in a bad state if it gets interrupted, it should recover from
most any state including bad states inadvertently created by novice
programmers, it's supposed to support shared non-writable parts and
user-specific writable parts, it's supposed to support PLTCOLLECTS and
PLTCOMPILEDROOTS, and it's supposed to have a dozen other properties
that I'm forgetting at the moment.

To answer the specific question, one reason you need to run `raco
setup' on a precompiled collection to fix up the documentation
cross-reference database and references, get libraries and launchers in
place, and perform whatever install-time actions the package wants.

Yes, we can make `raco setup' work with packages that contain both
source and binaries, and I guess I'll go work on that instead of other
directions.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-21 Thread Eli Barzilay
Yesterday, Matthew Flatt wrote:
 
 Concretely, new repositories that are just a subset of the current
 repo would be off-by-one in directory structure compared to a normal
 package. Each package should correspond to a subtree starting from
 the collects level, not the parent of collects. We could massage
 the two views into one, but I'd rather not.

That's really easy to deal with, and doesn't contradict what I
suggested, *but* given:

 To put it another way and overstate a little: I'm trying to get buy-in
 from dev to make the switch to packages wholesale. [...]

And even more, given:

5 hours ago, Matthew Flatt wrote:
 I think we won't get an ideal package split on the first N tries,
 and it will be easier to move files and directories around in one
 repository (using `git mv') instead of among multiple repositories.
 When we finally have mostly the right split, then we can use `git
 filter-branch'.

I think that there's a much easier and more elegant way to do this,
which is even easier for all developers.  Roughly speaking, it's
flipping what I suggested yesterday and doing it the other way:

  * Keep the repository as-is, no structural changes at all.

  * Keep working on things as usual, including work on the package
system and everything that is related.

  * As it gets to a workable state, keep a script that will *split*
the monolithic repo into separate packages.  This script can start
very simple, for example, a naive thing would be:

  cd $MAINTREE
  mkdir $PACKAGES/drracket
  mv collects/drracket collects/drscheme $PACKAGES/drracket

Everything that deals with packages would start from a fresh main
repo and and empty package directory, and will construct the
packages from it.  So, for example, the build will still make each
package independently, and distribution is still done by
assembling packages.

  * The main point is related to what you said above: the package
splittage is determined by the script, so if you find out that
some file belongs in a different package, or that packages need to
be combined, or split differently, or whatever -- this is all done
by just changing the script.

So you get two birds with a single stone: it's easy to experiment
freely in the early stages, and it's easy to adjust things when
the split converges to something that works fine.

  * When everything is working smoothly -- with the main effect being
a resolution of dependencies, both of existing code and in terms
of people being aware of them -- at this point it will be a good
time to switch to separate repos, and since all developers have
already gotten used to the package, there is now just the repo
change, and nothing else -- so it becomes a technical point like
switching from svn to git, not piled up on the more substantial
change.

As a side-effect, the final directory-splitting script can be used
with git's filter-branch to create the new repos.

I think that this offers the best in terms of being flexible as needed
while work is in progress, and separating the changes that people need
to adjust too which should make the whole process more comfortable.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-21 Thread Eli Barzilay
[keeping the different subject since this is still about the repo.]

Yesterday, Asumu Takikawa wrote:
 
 One nice thing about the current repo organization is that push
 notifications for every part of the PLT codebase go to all of the
 developers.
 
 Will that still be available in this organization scheme? (I don't
 care if it's opt-in too much, but opt-out will hopefully mean more
 eyes see the code)

This is easy both in our git server (it's easy to have a shared
configuration so all of them get the same notifications, and
bug-fix-messages are caught in all of them), and in github (where
you'll need to watch all of them).


Yesterday, Carl Eastlund wrote:
 
 I'd like to mention, though, that git submodules can be a real pain
 for synchronizing development of multiple repositories.  They seem
 to have been designed primarily for importing upstream repositories,
 rather than for multiple peer repositories.

Two points about submodules:

1. My impression is that they have improved a *lot* in the past ~2
   years or so.  Not only in terms of better functionality, but also
   in terms of convenience of using them.

2. If things go the way I suggested in the other email, then there's
   no real need to use submodules.  You need to have these
   repositories somewhere if you want to work on them (or a subset if
   you work on only some of them) -- and you should be able to get
   them any way you want.  There's no reason for the core repository
   to come with submodule points for all of the packages.  I think
   that it might makes sense to keep some meta repository for people
   who want a convenient checkout of all packages -- but if you don't
   like submodules, you just don't use it.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-21 Thread Matthew Flatt
We already have a system for constructing a script that can move files
around and adjust content as needed: git.

As long as some of us are trying to write that script while others are
changing the existing directories and files, there will be collisions.
We won't come up with a scripting system that handles those collisions
better than git.

I want to minimize conflicts and maximize the number of people who can
help refine the package structure. We all know how to use git to script
changes to the repo, and we know how to work with a shared repo to make
conflicts manageable. That's why I'm asking that we all change together
to a new repo structure.

I think a lot of people on this list are eager to contribute to the
shift into packages. As someone close to the new structure, I'm telling
you my best guess at how you can help and in be in a position to help
more: let us switch the repo sooner rather of later. Then, everyone
will be in a good position to script progress in various ways.


At Tue, 21 May 2013 14:20:33 -0400, Eli Barzilay wrote:
 Yesterday, Matthew Flatt wrote:
  
  Concretely, new repositories that are just a subset of the current
  repo would be off-by-one in directory structure compared to a normal
  package. Each package should correspond to a subtree starting from
  the collects level, not the parent of collects. We could massage
  the two views into one, but I'd rather not.
 
 That's really easy to deal with, and doesn't contradict what I
 suggested, *but* given:
 
  To put it another way and overstate a little: I'm trying to get buy-in
  from dev to make the switch to packages wholesale. [...]
 
 And even more, given:
 
 5 hours ago, Matthew Flatt wrote:
  I think we won't get an ideal package split on the first N tries,
  and it will be easier to move files and directories around in one
  repository (using `git mv') instead of among multiple repositories.
  When we finally have mostly the right split, then we can use `git
  filter-branch'.
 
 I think that there's a much easier and more elegant way to do this,
 which is even easier for all developers.  Roughly speaking, it's
 flipping what I suggested yesterday and doing it the other way:
 
   * Keep the repository as-is, no structural changes at all.
 
   * Keep working on things as usual, including work on the package
 system and everything that is related.
 
   * As it gets to a workable state, keep a script that will *split*
 the monolithic repo into separate packages.  This script can start
 very simple, for example, a naive thing would be:
 
   cd $MAINTREE
   mkdir $PACKAGES/drracket
   mv collects/drracket collects/drscheme $PACKAGES/drracket
 
 Everything that deals with packages would start from a fresh main
 repo and and empty package directory, and will construct the
 packages from it.  So, for example, the build will still make each
 package independently, and distribution is still done by
 assembling packages.
 
   * The main point is related to what you said above: the package
 splittage is determined by the script, so if you find out that
 some file belongs in a different package, or that packages need to
 be combined, or split differently, or whatever -- this is all done
 by just changing the script.
 
 So you get two birds with a single stone: it's easy to experiment
 freely in the early stages, and it's easy to adjust things when
 the split converges to something that works fine.
 
   * When everything is working smoothly -- with the main effect being
 a resolution of dependencies, both of existing code and in terms
 of people being aware of them -- at this point it will be a good
 time to switch to separate repos, and since all developers have
 already gotten used to the package, there is now just the repo
 change, and nothing else -- so it becomes a technical point like
 switching from svn to git, not piled up on the more substantial
 change.
 
 As a side-effect, the final directory-splitting script can be used
 with git's filter-branch to create the new repos.
 
 I think that this offers the best in terms of being flexible as needed
 while work is in progress, and separating the changes that people need
 to adjust too which should make the whole process more comfortable.
 
 -- 
   ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
 http://barzilay.org/   Maze is Life!
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Asumu Takikawa
On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote:
 Eventually, when the dust settles, I think we'll want to convert every
 directory to its own git repo, and then we can incorporate the
 individual repos as git submodules.

One nice thing about the current repo organization is that push
notifications for every part of the PLT codebase go to all of the
developers.

Will that still be available in this organization scheme? (I don't care
if it's opt-in too much, but opt-out will hopefully mean more eyes see
the code)

Cheers,
Asumu
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Carl Eastlund
On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu wrote:

 On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote:
  Eventually, when the dust settles, I think we'll want to convert every
  directory to its own git repo, and then we can incorporate the
  individual repos as git submodules.

 One nice thing about the current repo organization is that push
 notifications for every part of the PLT codebase go to all of the
 developers.

 Will that still be available in this organization scheme? (I don't care
 if it's opt-in too much, but opt-out will hopefully mean more eyes see
 the code)

 Cheers,
 Asumu


Overall, I'm really glad to see Racket moving into the package system.  I
think it will be good for both (the Racket core and the package system).
I'd like to mention, though, that git submodules can be a real pain for
synchronizing development of multiple repositories.  They seem to have been
designed primarily for importing upstream repositories, rather than for
multiple peer repositories.  I'm not much more fond of the alternatives I
have tried, either; if we're committing to splitting Racket into multiple
repositories as well as multiple packages, we should be aware there may be
another minor git learning curve ahead.

Thanks to Jay and Matthew for working on all of this!

--Carl
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Jose A. Ortega Ruiz
On Mon, May 20 2013, Matthew Flatt wrote:

[...]

 Some drawbacks to omitting source are immediately apparent:

  - Users will be less able to make source changes on their systems to
help us debug.

Having the binary form of a package installed does not preclude
upgrading to a source package. So, we could ask a user to use the
package manager to install the source form of, say, the drracket
package, and then try out a change. In that way, users can still
help, but it will be less convenient.

  - Users will be less able to read installed code as examples.

Our source code is now easily available via the web interfaces at
http://git.racket-lang.org/ and GitHub, so users can always look
there, instead.

FWIW (and i know it's not much, but anyway), this will be a big loss for
Geiser users, who right know can jump to any core function source with a
single keystroke and without leaving the editor.  IME, there's a huge
difference between that and having to switch to a web browser to find
it, both when learning or programming new applications.

Here's hope that down the line there'll be binary+source packages that
end users can install with the same ease as today.

Cheers,
jao
-- 
Nostalgia isn’t what it used to be.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-20 Thread Eli Barzilay
An hour and a half ago, Matthew Flatt wrote:
 I used to think that we'd take advantage of the package manager by
 gradually pulling parts out of the Racket git repo and making them
 packages.

(Generally, +1.  I'll reply just on the repository point here.)


 This plan has two prominent implications:
 
  * The current git repo's directory structure will change. [...]

I very strongly object to this.  While in theory git will follow
everything, this requires doing some more work which most people won't
know about, so a result of all of this is going to be loss of
historical information.  So I think that it's much better to move
directly to several repositories (IIUC, one repository for each
suggested toplevbel directory).

The only goal of the intermediate state seems to be providing some
gradual change before switching to submodules -- and on one hand, I
think that the new layout will force people to learn how to deal with
it, and on the other, it'll make people spend work twice, once on the
layout change and again on the switch to modules.

So assuming that a gradual change is the goal, I think that there are
better ways to do that.  Here's a suggestion:

  * The main repository is split into the different repositories.
Initially, this is done without any consideration for submodules,
with the idea of having advanced gitters come up with their own
solutions.

  * However, don't remove the main repository, just keep it as an
aggregate of the content that is found in the split repositories.
If the structure is going to be the same in all of them (ie, the
same directories and files are in all as they are now in the
single repository), then pulling changes from the new repos to the
main one is going to be trivial to the point of being automated.

  * The new repos will not get mirrored on github.  This is because
github repos come with a bunch of functionality that is best kept
in a single place -- like wiki pages and issues.  (But see below.)

  * So the only difference would be for people who commit work to the
main repo.  This can be done in various ways, depending on the
developers who do these commits:

- Advanced developers would have all of the repos and will push
  directly to them.  This group of people is likely to start
  small, and evenetually have all of the core committers in it.
  (Core as in the people who push to the plt repo now.)  As I
  said above, this will likely involve some experimentation for
  these people, which will later get translated into easy setups
  that will allow more people to switch to it.

- Outsiders can continue to work as usual: fork the main plt
  repo (mostly on github) and send pull requests.  The pull
  request will then be pushed by a core committer as it is done
  now, where the core committer pushes to the actual relevant
  repo, and that eventually propagates back to the main repo so
  that the contributor sees that the work was merged.  The merging
  should usually be trivial, except in extremely rare cases where
  the push touches on files from different new repos.  In these
  cases it should be possible to either split the commit into
  different ones for the different repos, or ask the contributor
  to split the commit to different ones for the different files.

- The only people left are core committers who will work with the
  main repository.  I can see a bunch of ways to deal with this.
  First, the commit can be sent as a pull request to one of the
  advanced gitters who will then do it for the actual repository.
  This is easier than it sounds: git has a bunch of commands to do
  this, and for all practical purposes, you'd just replace the
  git push part of your workflow with git send-email.  I
  *think* (but I'm not 100% sure) that this work can be automated
  too, so it's fine if I (or some other excited soul) gets these
  emails and merges them.

  There is an inconvenience point here: once you send a pull
  request and its merged, the actual commits that are merged (to
  the main repo, which you're using if you're in this group) are
  different objects.  This is nothing new -- it's something that
  people who do all contibutions via pull requests deal with,
  since we have a policy of rebasing rather than merging.
  Usually, when you pull from the update repo, git should notice
  that your changes are already there.  (At least I hope it does.)

  Things will be less convenient for people who use git more
  intensly: if you have lots of branches etc.  But I think that
  such people really should just move to the first group sooner...

  * This stage can go on for a while, as the code  machinery involved
evolves to a point of being smooth enough.  By smooth, I mean that
- it be easy enough to build the whole thing as you do now,
- 

Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Greg Hendershott
Well, ideally there would be some new module-name-source function
that could return URIs like http://path/to/file.rkt (or for that
matter, file:///path/to/file.rkt), based on info.rkt for packages?

Given that piece, a couple ways to do it -- favoring doing it more in
Emacs vs. more in Racket -- but both involve having a local cache, and
also using If-Modified-Since request headers? Maybe even the ability
to prefill the cache and never expire it ... which seems awfully
like source installation by other means?


p.s. An approach favoring doing it more on the Racket side than on the
Emacs side, could also support FRs like one I saw on the main list
recently, which is that File | Open in DrRacket should be able to open
remote files. That was for a classroom setting IIRC.
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages: repository

2013-05-20 Thread Matthew Flatt
At Mon, 20 May 2013 18:27:34 -0400, Eli Barzilay wrote:
 An hour and a half ago, Matthew Flatt wrote:
  This plan has two prominent implications:
  
   * The current git repo's directory structure will change. [...]
 
 I very strongly object to this.  While in theory git will follow
 everything, this requires doing some more work which most people won't
 know about, so a result of all of this is going to be loss of
 historical information.  So I think that it's much better to move
 directly to several repositories (IIUC, one repository for each
 suggested toplevbel directory).
 
 The only goal of the intermediate state seems to be providing some
 gradual change before switching to submodules -- and on one hand, I
 think that the new layout will force people to learn how to deal with
 it, and on the other, it'll make people spend work twice, once on the
 layout change and again on the switch to modules.
 
 So assuming that a gradual change is the goal, I think that there are
 better ways to do that.

It's about a kind of gradual change, but not quite so gradual. I would
like to switch immediately to a package-oriented view of Racket,
instead of thinking about packages as something that you get by
squinting at our current tree.

Concretely, new repositories that are just a subset of the current repo
would be off-by-one in directory structure compared to a normal
package. Each package should correspond to a subtree starting from the
collects level, not the parent of collects. We could massage the
two views into one, but I'd rather not.

At the time time, I agree that it's tricky to properly extract history
for the new repositories, and there will be many issues in dealing with
multiple repositories (e.g., submodules may not be the way to go). So,
I'd like to delay that part until a second step.

To put it another way and overstate a little: I'm trying to get buy-in
from dev to make the switch to packages wholesale. The little bit of
staging in the plan is to make the conversion itself easier, and not to
simplify the switch for developers.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Neil Van Dyke
I'm calling for making Racket and package source transparently 
accessible, even though not actually bundled into distribution downloads...


Racket has a research and education bent, and also attracts some of the 
more sophisticated developers.  For all of these audiences, there's a 
tradition of accessibility of source, and arguably value in that.


I think transparent navigability to source would be appropriate for 
Racket.  Transparent navigability to source could mean that DrRacket 
will download source on-demand for any binary package that is installed, 
rather than source having to be bundled with the package, or requiring 
user to go get source separately.


Admittedly, I think source accessibility is not as important in Racket 
as in Emacs.  (Because, for general programming, the Racket 
documentation is sufficient and the source wouldn't help.  And for 
extension of the programming environment, which was one of Emacs's 
greatest achievements, extending DrRacket is much harder; plus, the 
DrRacket source is not much help if you didn't previously tackle the 
manuals on frameworks and such, which almost no one does.)  But there 
are uses for source accessibility, especially for independent add-on 
packages, and the principle of being able to easily pop the hood still 
has value.


Example: Imagine I'm in the middle of writing a Racket program and am 
wondering about characteristics of some kind of I/O port in Racket.  
With transparent source accessibility, I can just click on an identifier 
in my program in DrRacket to start browsing the implementation.  Maybe I 
see a possible improvement, or seeing the source pre-empts yet another 
email list question that otherwise only Matthew could answer, or I feel 
empowered to go add a new feature.  If the source is not as accessible, 
then I'm more likely to be a mere naive user of the tools, rather than 
to understand the tools and help improve them.


Side note: I'm also looking forward to seeing how this new packaging 
works out, especially if it leads to me being able to ship small binary 
packages for iPhone/Mac/Windows, implemented in Racket.  (I don't care 
about open source principles on those very closed platforms; I just want 
their money.  Which is totally different from what I want from an 
intellectually-inclined open source development platform.)


Neil V.

_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Juan Francisco Cantero Hurtado

On 05/20/13 23:24, Carl Eastlund wrote:

On Mon, May 20, 2013 at 4:58 PM, Asumu Takikawa as...@ccs.neu.edu wrote:


On 2013-05-20 14:42:15 -0600, Matthew Flatt wrote:

Eventually, when the dust settles, I think we'll want to convert every
directory to its own git repo, and then we can incorporate the
individual repos as git submodules.


One nice thing about the current repo organization is that push
notifications for every part of the PLT codebase go to all of the
developers.

Will that still be available in this organization scheme? (I don't care
if it's opt-in too much, but opt-out will hopefully mean more eyes see
the code)

Cheers,
Asumu



Overall, I'm really glad to see Racket moving into the package system.  I
think it will be good for both (the Racket core and the package system).
I'd like to mention, though, that git submodules can be a real pain for
synchronizing development of multiple repositories.  They seem to have been
designed primarily for importing upstream repositories, rather than for
multiple peer repositories.  I'm not much more fond of the alternatives I
have tried, either; if we're committing to splitting Racket into multiple
repositories as well as multiple packages, we should be aware there may be
another minor git learning curve ahead.

Thanks to Jay and Matthew for working on all of this!



I also think that git submodules are a bad idea for packages. One git 
repo per package is more simple and less problematic.


Thanks for the hard work :)


_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Neil Van Dyke

Juan Francisco Cantero Hurtado wrote at 05/20/2013 11:20 PM:


I also think that git submodules are a bad idea for packages. One git 
repo per package is more simple and less problematic.


Do people expect to often do commits involving changes across these 
package boundaries?  If so, would another option be to keep a single 
repo, not use these Git submodules, and just have Racket translate the 
Git paths behind-the-scenes for packages coming from this core Racket repo?


Neil V.

_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Jon Zeppieri
On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote:
 [snip]

 Example: Imagine I'm in the middle of writing a Racket program and am
 wondering about characteristics of some kind of I/O port in Racket.  With
 transparent source accessibility, I can just click on an identifier in my
 program in DrRacket to start browsing the implementation.  Maybe I see a
 possible improvement, or seeing the source pre-empts yet another email list
 question that otherwise only Matthew could answer, or I feel empowered to go
 add a new feature.  If the source is not as accessible, then I'm more likely
 to be a mere naive user of the tools, rather than to understand the tools
 and help improve them.


+inf.0

Though the easiest way to make the source available is just to keep it
in the distribution. I'll be sad to see it go.

-Jon
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] proposal for moving to packages

2013-05-20 Thread Eric Dobson
I'm not sure I follow on why binary packages make it easier to reduce
dependencies between packages, or why binary packages offer faster
installs.

I'm guessing that binary packages prevent cyclic dependencies between
packages, but it seems like there are many other options that still
get this side effect. Such as explicit checks when building the
package.

For faster installs, the only benefit I see of binary packages over
precompiled source packages is a small savings in size which doesn't
seem like it would amount to much of the install time.

Can someone explain the claims for binary packages?

On Mon, May 20, 2013 at 8:57 PM, Jon Zeppieri zeppi...@gmail.com wrote:
 On Mon, May 20, 2013 at 10:04 PM, Neil Van Dyke n...@neilvandyke.org wrote:
 [snip]

 Example: Imagine I'm in the middle of writing a Racket program and am
 wondering about characteristics of some kind of I/O port in Racket.  With
 transparent source accessibility, I can just click on an identifier in my
 program in DrRacket to start browsing the implementation.  Maybe I see a
 possible improvement, or seeing the source pre-empts yet another email list
 question that otherwise only Matthew could answer, or I feel empowered to go
 add a new feature.  If the source is not as accessible, then I'm more likely
 to be a mere naive user of the tools, rather than to understand the tools
 and help improve them.


 +inf.0

 Though the easiest way to make the source available is just to keep it
 in the distribution. I'll be sad to see it go.

 -Jon
 _
   Racket Developers list:
   http://lists.racket-lang.org/dev
_
  Racket Developers list:
  http://lists.racket-lang.org/dev