Re: [GHC] #3271: New methods for Data.Sequence

GHC Thu, 16 Jul 2009 16:54:14 -0700

#3271: New methods for Data.Sequence
----------------------------------+-----------------------------------------
    Reporter:  LouisWasserman     |        Owner:  LouisWasserman  
        Type:  proposal           |       Status:  new             
    Priority:  normal             |    Milestone:  Not GHC         
   Component:  libraries (other)  |      Version:  6.10.2          
    Severity:  normal             |   Resolution:                  
    Keywords:                     |   Difficulty:  Unknown         
    Testcase:                     |           Os:  Unknown/Multiple
Architecture:  Unknown/Multiple   |  
----------------------------------+-----------------------------------------
Comment (by LouisWasserman):


 In response to Ross's useful suggestions (which I had not actually noticed
 until recently), I have made considerable changes to my Data.Sequence
 patch here.

 Here are the relevant points:

     * I completely rewrote the sorting method.  The sort is unstable, but
 it is 40 lines (much more maintainable than the several-hundred line
 implementation from earlier) and runs *twice as fast as* (fromList .
 Data.List.sort . toList) for n=50000.  (For sorted lists, it clocks in at
 about 4x faster.)
           o This is with no RTS options.  With -H128m, the advantage is
 considerably thinner -- 39ms vs. 43ms (about one standard deviation) --
 but the implementation is sufficiently compact that the moderate benefit
 is satisfactory.
     * tails and inits are considerably more sophisticated, running to
 about 50 lines apiece (although some of that code is shared between them),
 but
           o This implementation is genuinely asymptotically faster than
 the previous implementation: evaluating any tail from the sequence
 returned by tails takes O(log (min (i, n-i))), as opposed to O(n) in scanr
 (<|) empty xs, but evaluating every tail from the sequence takes O(n)
 overall, as opposed to O(n log n) in fromList [drop n xs | n <- [0..length
 xs]].
           o Without direct access to the internals of the sequence it is
 impossible to match the asymptotic performance of this implementation.
           o This implementation is also a hair faster in practice.
     * partition is now in fact implemented via a simple foldl', which is
 actually faster than my old, several-dozen-line implementation as well as
 obviously simpler.
     * filter has been added to the list of methods available in
 Data.Sequence.
     * iterate has been renamed to iterateN to reinforce the different
 type, which requires a finite bound on the sequence length.
     * On the back end, replicate, iterateN, and sortBy do not use
 fromList, but instead use a common framework that wraps construction of a
 sequence in an Applicative functor.  This automatically induces the node
 sharing that gives replicate performance O(log n), but behaves exactly
 correctly for all its other requirements.
     * As a result, there is a faster alternative to fromList if the length
 of the list is known.  The name and type of this method seems like it
 might become genuinely contentious, so I'm not inclined to expose that
 faster method at the moment.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3271#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Re: [GHC] #3271: New methods for Data.Sequence

Reply via email to