Am Sonntag, 11. Januar 2004 22:15 schrieb Tomasz Zielonka: > On Wed, Jan 07, 2004 at 07:30:35PM +0100, Tomasz Zielonka wrote: > > On Wed, Jan 07, 2004 at 05:41:04PM +0100, Wolfgang Jeltsch wrote: > > > Hello, > > > > > > does Data.Set.mkSet run in linear time when applied to a sorted list? > > > > No, at least not in the version from GHC 6.0. This implementation is > > based on FiniteMap. mkSet translates to listToFM, which incrementally > > constructs the result map using foldl on the input list. > > > > Hmmm, I may try to fix it in a day or two... it shouldn't be too > > difficult. > > Well, implementing the algorithm wasn't difficult, but keeping memory > usage low was quite tricky. In the first version the 'build' function > returned a pair with a tree and the rest of the input list. The cost > of this was so high that the linear version was still slower than > traditional listToFM for lists with 100000 elements. > > I checked that using unboxed tuples helps, but I wasn't satisfied with > such solution. In the end I found that transforming the function to use > continuation passing style gives similar performance. Now the speed gain > is easily noticable (3x on a 100000 element list and a couple of > lookups). > > The code given below won't immediately affect mkSet. This is only a > prototype, so the name and signature are probably not the best. > > The function could also cope with non-decreasing lists by omitting > duplicated keys, but I didn't wanted to make it too complex yet. > > listToFM could work as follows - take the sorted prefix of the list, > build a tree for this prefix in linear time and then fall back to > inserting elements one by one. > > It would be nicer to extract the longest sorted sublist from the list, > but I am afraid that this has Theta(N log N) complexity already. > > Well, what do you think about it? > Would you like such improvements to FiniteMap and Set? > > Best regards, > Tom > > --------------------------------------------------------------------------- > -- Building tree from a sorted list > > trySortedListToFM :: (Ord key OUTPUTABLE_key) > => [(key, elt)] -> Maybe (FiniteMap key elt) > trySortedListToFM list > > | sorted = Just (build (length list) list const) > | otherwise = Nothing > > where > sorted = and (zipWith (\(l,_) (r,_) -> l < r) list (tail list)) > > build 0 xs cont = > cont EmptyFM xs > build 1 ((key, elt) : xs) cont = > cont (unitFM key elt) xs > build n xs cont = > build leftSize xs (\left ((key, elt) : ys) -> > build rightSize ys (\right zs -> > cont (mkBranch 14{-which-} key elt left right) zs)) > where > leftSize = n `div` 2 > rightSize = n - leftSize - 1
Hello Tom, from which cardinality on one can notice a speed gain with the linear time algorithm? If this cardinality is larger than 1000, I probably don't need a linear time algorithm in my current project. However, it would be nice (at least from a theoretical point of view) to have a listToFM algorithm which naturally has O(n) time for sorted lists by which I mean that the algorithm doesn't include a test for sorted lists (or sublists) but is just constructed in such a way that it automatically has O(n) time for sorted lists. Wolfgang _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell