Send Beginners mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."
Today's Topics:
1. Re: Data.Binary.Get for large files (Philip Scott)
2. Re: Data.Binary.Get for large files (Kyle Murphy)
3. Re: Data.Binary.Get for large files (Daniel Fischer)
4. Code works in a file but not the interpreter (Oge)
5. Re: Code works in a file but not the interpreter (Daniel Fischer)
6. Re: Code works in a file but not the interpreter (Oge)
----------------------------------------------------------------------
Message: 1
Date: Fri, 30 Apr 2010 22:06:07 +0100
From: Philip Scott <[email protected]>
Subject: Re: [Haskell-beginners] Data.Binary.Get for large files
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed
Hi Daniel
> Replace getFloat64le with e.g. getWord64le to confirm.
> The reading of IEEE754 floating point numbers seems rather complicated.
> Maybe doing it differently could speed it up, maybe not.
>
>
That speeds things up by a factor of about 100 :)
I think there must be some efficiency to be extracted from there
somewhere.. Either the IEEE module or the Data.Binary.Get.
Is it possible to get the profiler to look deeper than the top level
module? With all the options I could find, it only ever tells me about
things in the file I am dealing with..Hm, 200MB file => ~25 million
Doubles, such a list needs at least 400MB.
> Still a long way to 2GB. I suspect you construct a list of thunks, not
> Doubles.
>
I think you are almost certainly right. Is there an easy way to see
if/how/where this is happening?
Thanks once again,
Philip
------------------------------
Message: 2
Date: Fri, 30 Apr 2010 17:57:48 -0400
From: Kyle Murphy <[email protected]>
Subject: Re: [Haskell-beginners] Data.Binary.Get for large files
To: [email protected]
Cc: [email protected]
Message-ID:
<[email protected]>
Content-Type: text/plain; charset="utf-8"
Check out the Real World Haskell chapter on profiling, it should have
everything you need to track down where the thunks are sneaking in:
http://book.realworldhaskell.org/read/profiling-and-optimization.html
It's particularly great in this case because the problem being diagnosed in
that chapter is most likely the same sort of problem you're seeing.
-R. Kyle Murphy
--
Curiosity was framed, Ignorance killed the cat.
On Fri, Apr 30, 2010 at 17:06, Philip Scott <[email protected]>wrote:
> Hi Daniel
>
>
> Replace getFloat64le with e.g. getWord64le to confirm.
>> The reading of IEEE754 floating point numbers seems rather complicated.
>> Maybe doing it differently could speed it up, maybe not.
>>
>>
>>
> That speeds things up by a factor of about 100 :)
>
> I think there must be some efficiency to be extracted from there
> somewhere.. Either the IEEE module or the Data.Binary.Get.
>
> Is it possible to get the profiler to look deeper than the top level
> module? With all the options I could find, it only ever tells me about
> things in the file I am dealing with..Hm, 200MB file => ~25 million Doubles,
> such a list needs at least 400MB.
>
>
> Still a long way to 2GB. I suspect you construct a list of thunks, not
>> Doubles.
>>
>>
>
> I think you are almost certainly right. Is there an easy way to see
> if/how/where this is happening?
>
> Thanks once again,
>
> Philip
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.haskell.org/pipermail/beginners/attachments/20100430/1e4504c7/attachment-0001.html
------------------------------
Message: 3
Date: Sat, 1 May 2010 00:10:48 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Data.Binary.Get for large files
To: [email protected], [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
Am Freitag 30 April 2010 23:06:07 schrieb Philip Scott:
> Hi Daniel
>
> > Replace getFloat64le with e.g. getWord64le to confirm.
> > The reading of IEEE754 floating point numbers seems rather
> > complicated. Maybe doing it differently could speed it up, maybe not.
>
> That speeds things up by a factor of about 100 :)
Yes, I too.
>
> I think there must be some efficiency to be extracted from there
> somewhere.. Either the IEEE module
Look at the code. It does a lot of hard work. That is probably necessary to
ensure correctness, but it's sloow.
If you feel like playing with fire,
{-# LANGUAGE BangPatterns, MagicHash #-}
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Get
import GHC.Prim
import Data.Word
getFloat64le :: Get Double
getFloat64le = fmap unsafeCoerce# getWord64le
myGetter !acc = do
   e <- isEmpty
   if e then
       return acc
     else do
       !t <- getFloat64le
       myGetter ((t+acc)/2)
may work on your system (no warranties, you know what 'unsafe' means, don't
you?).
> or the Data.Binary.Get.
Considering that it's quick enough getting Word64, you won't get much
improvement there.
>
> Is it possible to get the profiler to look deeper than the top level
> module?
Lots of {-# SCC "foo" #-} pragmas. Or create a local copy of the module and
import that, then -prof -auto-all should give more info.
> With all the options I could find, it only ever tells me about
> things in the file I am dealing with..Hm, 200MB file => ~25 million
> Doubles, such a list needs at least 400MB.
>
> > Still a long way to 2GB. I suspect you construct a list of thunks, not
> > Doubles.
>
> I think you are almost certainly right. Is there an easy way to see
> if/how/where this is happening?
Read the core, profile with all -hx flags and look at the profiles, show
the code to more experienced Haskellers.
>
> Thanks once again,
>
> Philip
------------------------------
Message: 4
Date: Sat, 1 May 2010 13:39:47 +0900
From: Oge <[email protected]>
Subject: [Haskell-beginners] Code works in a file but not the
interpreter
To: beginners <[email protected]>
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=UTF-8
Hi all,
This is my first post to haskell-beginners. I am trying to use the
SegmentTree-0.2 library from Hackage but I get an error when I try to
use some of its functions. For instance when I load a file containing
the following
module Stabbing.SegmentTree (counts) where
import Data.SegmentTree
counts :: [(Rational, Rational)] -> [Rational] -> [Integer]
counts intervals points = map (countingQuery segmentTree) points
where
segmentTree = fromList intervals
in the GHC interpreter (6.10.3), my interactions look as such
*Stabbing.SegmentTree> counts [(4, 5)] [3]
[0]
*Stabbing.SegmentTree> fromList [(4, 5)]
<interactive>:1:0:
No instance for (SegmentTree-0.2:Data.SegmentTree.Measured.Measured
(SegmentTree-0.2:Data.SegmentTree.Interval.Interval t) t1)
arising from a use of `fromList' at <interactive>:1:0-16
Possible fix:
add an instance declaration for
(SegmentTree-0.2:Data.SegmentTree.Measured.Measured
(SegmentTree-0.2:Data.SegmentTree.Interval.Interval t) t1)
In the expression: fromList [(4, 5)]
In the definition of `it': it = fromList [(4, 5)]
As you can see, "fromList intervals" is a sub-expression of counts,
but when I try to evaluate a similar expression at the interpreter, I
get an error. Does anyone know what the problem is? This is not the
"No instance for (Show ..." error that I'm used to.
Ogechi Nnadi.
------------------------------
Message: 5
Date: Sat, 1 May 2010 12:15:29 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Code works in a file but not the
interpreter
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
Am Samstag 01 Mai 2010 06:39:47 schrieb Oge:
> Hi all,
>
> This is my first post to haskell-beginners. I am trying to use the
> SegmentTree-0.2 library from Hackage but I get an error when I try to
> use some of its functions. For instance when I load a file containing
> the following
>
> module Stabbing.SegmentTree (counts) where
> import Data.SegmentTree
>
> counts :: [(Rational, Rational)] -> [Rational] -> [Integer]
> counts intervals points = map (countingQuery segmentTree) points
> where
> segmentTree = fromList intervals
>
> in the GHC interpreter (6.10.3), my interactions look as such
>
> *Stabbing.SegmentTree> counts [(4, 5)] [3]
> [0]
> *Stabbing.SegmentTree> fromList [(4, 5)]
>
> <interactive>:1:0:
> No instance for (SegmentTree-0.2:Data.SegmentTree.Measured.Measured
>
> (SegmentTree-0.2:Data.SegmentTree.Interval.Interval t) t1)
> arising from a use of `fromList' at <interactive>:1:0-16
> Possible fix:
> add an instance declaration for
> (SegmentTree-0.2:Data.SegmentTree.Measured.Measured
> (SegmentTree-0.2:Data.SegmentTree.Interval.Interval t) t1)
> In the expression: fromList [(4, 5)]
> In the definition of `it': it = fromList [(4, 5)]
>
> As you can see, "fromList intervals" is a sub-expression of counts,
> but when I try to evaluate a similar expression at the interpreter, I
> get an error. Does anyone know what the problem is? This is not the
> "No instance for (Show ..." error that I'm used to.
>
> Ogechi Nnadi.
Well,
fromList :: (Monoid t, Measured (Interval a) t, Ord a)
=> [(a, a)] -> STree t a
, so in fromList [(4,5)], it doesn't know what types to use for t and a,
hence the error message that says it doesn't have an instance of Measured
for two arbitrary types.
In counts, the result of fromList, segmentTree, is used as the first
argument of
countingQuery :: (Measured (Interval a) (Sum b), Ord a)
=> STree (Sum b) a -> a -> b
which resolves the 't' in fromList's type as 'Sum b' - not yet much
progress, but the type signature
counts :: [(Rational, Rational)] -> [Rational] -> [Integer]
fixes the types, a is Rational and b is (Sum Integer). Now the compiler
knows exactly what types to use, since the constraints of fromList -
(Monoid (Sum Integer), Measured (Interval Rational) (Sum Integer), Ord
Rational) - and countingQuery -
(Measured (Interval Rational) (Sum Integer); Ord Rational) - are fulfilled,
it works.
If you tell ghci which types to use in
fromList [(4,5)]
, say
fromList [(4,5)] :: STree (Sum Integer) Rational
(you probably need to bring Data.Monoid into scope - import into your
module or ":m +Data.Monoid" in ghci - to have 'Sum' available), that will
work too.
------------------------------
Message: 6
Date: Sat, 1 May 2010 19:55:52 +0900
From: Oge <[email protected]>
Subject: Re: [Haskell-beginners] Code works in a file but not the
interpreter
To: beginners <[email protected]>
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=UTF-8
2010/5/1 Daniel Fischer <[email protected]>:
> Am Samstag 01 Mai 2010 06:39:47 schrieb Oge:
>> Hi all,
>>
>> This is my first post to haskell-beginners. I am trying to use the
>> SegmentTree-0.2 library from Hackage but I get an error when I try to
>> use some of its functions. For instance when I load a file containing
>> the following
>>
>> module Stabbing.SegmentTree (counts) where
>> import Data.SegmentTree
>>
>> counts :: [(Rational, Rational)] -> [Rational] -> [Integer]
>> counts intervals points = map (countingQuery segmentTree) points
>> Â where
>> Â Â segmentTree = fromList intervals
>>
>> in the GHC interpreter (6.10.3), my interactions look as such
>>
>> *Stabbing.SegmentTree> counts [(4, 5)] [3]
>> [0]
>> *Stabbing.SegmentTree> fromList [(4, 5)]
>>
>> <interactive>:1:0:
>> Â Â No instance for (SegmentTree-0.2:Data.SegmentTree.Measured.Measured
>>
>> (SegmentTree-0.2:Data.SegmentTree.Interval.Interval t) t1)
>> Â Â Â arising from a use of `fromList' at <interactive>:1:0-16
>> Â Â Possible fix:
>> Â Â Â add an instance declaration for
>> Â Â Â (SegmentTree-0.2:Data.SegmentTree.Measured.Measured
>> Â Â Â Â Â (SegmentTree-0.2:Data.SegmentTree.Interval.Interval t) t1)
>> Â Â In the expression: fromList [(4, 5)]
>> Â Â In the definition of `it': it = fromList [(4, 5)]
>>
>> As you can see, "fromList intervals" is a sub-expression of counts,
>> but when I try to evaluate a similar expression at the interpreter, I
>> get an error. Does anyone know what the problem is? This is not the
>> "No instance for (Show ..." error that I'm used to.
>>
>> Ogechi Nnadi.
>
> Well,
>
> fromList :: (Monoid t, Measured (Interval a) t, Ord a)
> Â Â Â Â => [(a, a)] -> STree t a
>
> , so in fromList [(4,5)], it doesn't know what types to use for t and a,
> hence the error message that says it doesn't have an instance of Measured
> for two arbitrary types.
>
> In counts, the result of fromList, segmentTree, is used as the first
> argument of
>
> countingQuery :: (Measured (Interval a) (Sum b), Ord a)
> Â Â Â Â => STree (Sum b) a -> a -> b
>
> which resolves the 't' in fromList's type as 'Sum b' - not yet much
> progress, but the type signature
>
> counts :: [(Rational, Rational)] -> [Rational] -> [Integer]
>
> fixes the types, a is Rational and b is (Sum Integer). Now the compiler
> knows exactly what types to use, since the constraints of fromList -
> (Monoid (Sum Integer), Measured (Interval Rational) (Sum Integer), Ord
> Rational) - and countingQuery -
> (Measured (Interval Rational) (Sum Integer); Ord Rational) - are fulfilled,
> it works.
>
> If you tell ghci which types to use in
>
> fromList [(4,5)]
>
> , say
>
> fromList [(4,5)] :: STree (Sum Integer) Rational
>
> (you probably need to bring Data.Monoid into scope - import into your
> module or ":m +Data.Monoid" in ghci - to have 'Sum' available), that will
> work too.
>
It worked! Thanks a lot.
------------------------------
_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners
End of Beginners Digest, Vol 23, Issue 1
****************************************