On Thu, Aug 12, 2010 at 6:46 PM, Eli Barzilay <e...@barzilay.org> wrote: > On Aug 12, Sam Tobin-Hochstadt wrote: >> On Thu, Aug 12, 2010 at 7:28 AM, Eli Barzilay <e...@barzilay.org> wrote: >> > >> > One thing about stability that bugs me is pushing changes and >> > extensions that are likely to change. For example, I'm worried >> > about Jay's push for a number of new features as a response to a >> > blog post, it looks like coding in haste that is likely to change. >> >> This is something that the `unstable' collection can be useful for. > > +1, in general.
Since some of these features were already available in the unstable collection, it's clear that useful functionality there is not visible enough for beginners. > On Aug 12, Matthew Flatt wrote: >> At Thu, 12 Aug 2010 07:28:34 -0400, Eli Barzilay wrote: >> > One thing about stability that bugs me is pushing changes and >> > extensions that are likely to change. For example, I'm worried >> > about Jay's push for a number of new features as a response to a >> > blog post, it looks like coding in haste that is likely to change. >> >> I strongly support Jay's current push. We don't need to wait for a >> new package system to get our data-structure libraries in better >> shape. > > My point was completely unrelated to packaging. (Jay added it to the > core racket language anyway.) Perhaps his point was that the "ideal" to handle these additions would be a well publicized and easy to install package. > On Aug 12, Jay McCarthy wrote: >> >> As a final note, I don't intend this to be a mere pandering to this >> blog post. When I was reading the blog post I found myself saying, >> "Ya but you could implement that so easily" or "Ya, it is annoying >> to use that pattern", etc. Hearing myself say that told me that I've >> always wanted many of these functions but I always implemented them >> in a one-off fashion or used a boilerplate pattern. > > I should explain why I said the above. (Yes, this will be verbose. > Why? Because my short one-sentence version was taken as a baseless > whine, so I'll go through the details that I'd think through before > doing this. Perhaps this is too much, but I'd like to think it isn't, > especially for code in racket/private.) > > First, I completely agree with this specific mode of operation (and > have added several things in a similar way) -- but the thing that I'm > careful with is adding things that might be regretted later on. One > place that I'm not too happy about is `fold-files', which I'll use as > an example. IMO, such regretting can lead to one of these: > * Coming up with a better interface, but keeping the old one for > compatibility (keeping `fold-files' as is, and keeping it working), > * Same, but removing the old one (either a new `fold-files' with a > different interface, or a new name instead), > * Improving the interface in a desired way, while keeping things > backward compatible. > > The last one is almost always the best way to go, and I view keyword > arguments as an essential feature because it makes this option > practical. In the `fold-files' case it is likely that problems could > be fixed by adding more keywords, and possible changing the default > behavior (which is more work but even that is reduced with keywords, > since existing call sites only need to be updated with a new keyword). > Without keywords, this would not be possible. > > But dealwing with sequences is not something that can be deferred for > later with some keyword -- at least not as easily as with a function. > And in light of this I'd be much more careful about adding things that > might turn out to be more regrets. > > As for the actual thing that I'm talking about -- sequences and making > them more list like -- I should have explained why I think this > addition is not something that is clearly desired in my view. > > Yes, there is definitely a problem with sequences and lists -- the > first is way more complicated to deal with than the second, and Mark's > post did highlight this issue. One solution is what Jay did -- do the > extra work of building the list-like sequence functions, and given > enough primitives, the rest would be easy to add if/when needed. But > what about these issues: > > ** The addition is *slow*. Very slow. I wrote a test program to sum > up the integers from 0 to 2000000 -- and I get these numbers (test > code attached below): > > - Plain `for' loop: cpu time: 9 real time: 10 gc time: 0 > - Using a sequence value: cpu time: 222 real time: 222 gc time: 0 > - Using `seqn-cdr' on range: cpu time: 1142 real time: 1142 gc time: 211 > - `seqn-cdr' + value: cpu time: 1141 real time: 1141 gc time: 210 > > I view this as a real problem -- performance issues at the 100x > slowdown are bugs at the library level, but at the core level > they're *bad* bugs. > > ** At the conceptual level, there is a major problem with for > sequences -- if you care about performance, they're second-class > values. As seen above, using a sequence value instead of the form > directly in the for loop makes it 20x slower. This is kind of ok > as things currently stand (pre Jay's afdditions), since most people > who deal with sequences are aware of the issue and propagate the > same cost tradeoff (providing both a syntactic and a value > variant). Making sequence values be more list-like leads to > emphasizing this problem, and this can lead to problems that end up > in regretting something => overall reduced stability when there's > some new interface. And specifically in this case, a 100x slowdown > is easily noticeable. > > For the first of these (related) problems, you can say that it's a > performance problem that can be improved out later, with no interface > changes. The question is how much it can improve. (One way to make > it faster would be to provide macro versions of the whole set of > sequence operations, and this ends up in double the amount of code.) > > Regardless, the second problem cannot be improved -- at least not > without some major redesign. But obviously this means that the > problem is still there regardless of extensions -- and making sequence > values easier to handle will eventually happen anyway. The question > is whether there is some other way to provide this functionality > (list-like sequences) -- and in this case Mark talks explicitly about > the Clojure feature that makes sequences very easy: lazy lists. So, > I've tried to get a rough idea of the costs for this simple example of > a range, and I got: > cpu time: 1581 real time: 1581 gc time: 868 > This is still bad, but at the same neighborhood -- so maybe a better > direction for making sequences nicer is to try making them (the value > versions) into lazy lists? It's true that this is close to the > 100x-worse version of `seqn-rest', but if they're both close, then why > not explore making the lazy list version run faster? > > But if I switch from `lazy' to plain thunks for the lazy list (which > is probably more appropriate here anyway), I get > cpu time: 113 real time: 113 gc time: 19 > and this is even better than the current (in-range ...) -- and this is > not even including tricks like Shriram mentioned of mixing laziness > and strictness (eg, a range could have a delayed thunk at only every > 10th cdr). So this *does* look like a better way to get this done: > speed up sequence values and make them easy to manage. Why not try > that or at least *talk* about trying that before adding a chunk of > code that fights with the current windmills, making future > improvements even harder to do? > > (And at this point I can see how people can think that such an > improvement can be done regardless of the current extensions -- but my > guess is that nobody will touch it, and things will stay as they are > now. For now.) > > (I'm obviously hoping that this guess will turn out wrong as a result > of writing about it. But probably not. I'll shut up now.) I parse your comments like this: - We can't do these sequence functions fast. - When we didn't provide them, people complained that they were missing. - When we provide them slowly, people will complain that Racket is slow. - It is worse to be slow than featureless. I think it is worse to be featureless. Of the additions I made, I believe that only seqn-cons, seqn-rest, seqn-tail, seqn-append, seqn-map, seqn-filter, and seqn-add-between will have the speed problem. The other functions that produce values will be a little slower than their equivalents rewritten directly in 'for's, but sometimes it is nice to write: (seqn-andmap even? s) rather than: (for/and ([e s]) (even? e)). I wager that seqn-cons, seqn-rest, and seqn-tail could be made very faster if it were possible for a do-sequence to turn over control to another sequence that had already started. seqn-append would be helped a little. I think it's worth it for them to be there despite the problems you've mentioned. I don't think they're like fold-files (which I agree has a bad interface); they mimic the time honored list API. Jay -- Jay McCarthy <j...@cs.byu.edu> Assistant Professor / Brigham Young University http://teammccarthy.org/jay "The glory of God is Intelligence" - D&C 93 _________________________________________________ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/dev