#3271: New methods for Data.Sequence
----------------------------------+-----------------------------------------
Reporter: LouisWasserman | Owner: LouisWasserman
Type: proposal | Status: new
Priority: normal | Milestone: Not GHC
Component: libraries (other) | Version: 6.10.2
Severity: normal | Resolution:
Keywords: | Difficulty: Unknown
Testcase: | Os: Unknown/Multiple
Architecture: Unknown/Multiple |
----------------------------------+-----------------------------------------
Comment (by LouisWasserman):
In response to Ross's useful suggestions (which I had not actually noticed
until recently), I have made considerable changes to my Data.Sequence
patch here.
Here are the relevant points:
* I completely rewrote the sorting method. The sort is unstable, but
it is 40 lines (much more maintainable than the several-hundred line
implementation from earlier) and runs *twice as fast as* (fromList .
Data.List.sort . toList) for n=50000. (For sorted lists, it clocks in at
about 4x faster.)
o This is with no RTS options. With -H128m, the advantage is
considerably thinner -- 39ms vs. 43ms (about one standard deviation) --
but the implementation is sufficiently compact that the moderate benefit
is satisfactory.
* tails and inits are considerably more sophisticated, running to
about 50 lines apiece (although some of that code is shared between them),
but
o This implementation is genuinely asymptotically faster than
the previous implementation: evaluating any tail from the sequence
returned by tails takes O(log (min (i, n-i))), as opposed to O(n) in scanr
(<|) empty xs, but evaluating every tail from the sequence takes O(n)
overall, as opposed to O(n log n) in fromList [drop n xs | n <- [0..length
xs]].
o Without direct access to the internals of the sequence it is
impossible to match the asymptotic performance of this implementation.
o This implementation is also a hair faster in practice.
* partition is now in fact implemented via a simple foldl', which is
actually faster than my old, several-dozen-line implementation as well as
obviously simpler.
* filter has been added to the list of methods available in
Data.Sequence.
* iterate has been renamed to iterateN to reinforce the different
type, which requires a finite bound on the sequence length.
* On the back end, replicate, iterateN, and sortBy do not use
fromList, but instead use a common framework that wraps construction of a
sequence in an Applicative functor. This automatically induces the node
sharing that gives replicate performance O(log n), but behaves exactly
correctly for all its other requirements.
* As a result, there is a faster alternative to fromList if the length
of the list is known. The name and type of this method seems like it
might become genuinely contentious, so I'm not inclined to expose that
faster method at the moment.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3271#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs