Beginners Digest, Vol 28, Issue 23

beginners-request Wed, 13 Oct 2010 10:11:51 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  Word8, Word32, ByteString, Int,      etc. conversions.
      (Daniel Fischer)
   2. Re:  Flying Dutchman sailing blind (Jeroen van Maanen)
   3. Re:  Flying Dutchman sailing blind (Daniel Fischer)


----------------------------------------------------------------------

Message: 1
Date: Tue, 12 Oct 2010 20:18:53 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Word8, Word32, ByteString, Int,        etc.
        conversions.
To: [email protected]
Cc: David McBride <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="utf-8"

On Tuesday 12 October 2010 18:38:31, David McBride wrote:
> I'm writing a pcap sniffer, and I have IP addresses from two different
> libraries that are equivalent, but typed differently.
>
> Data.IP.IPv4 Data.ByteString.Internal.ByteString
>
> and
>
> Network.Info.IPv4 = Network.Info.IPv4 !GHC.Word.Word32
>
> Both are probably identical, but I cannot for the life of me figure out
> how to get from one to the other.  Bytestring has the ability to take
> arrays of Word8, how do I split Word32 into Word8's?

If you have the binary package and know the endianness of the IP addresses, 
the simplest way to convert between those is

conv1 :: Network.Info.IPv4 -> Data.IP.IPv4
conv1 (Network.Info.IPv4 w32) =
    Data.IP.IPv4 (runPut $ putWord32be w32)

-- or putWord32le

conv2 :: Data.IP.IPv4 -> Network.Info.IPv4
conv2 (Data.IP.IPv4 bs) = Network.Ifo.IPv4 (runGet $ getWord32be)

-- or getWord32le

Alternatively, you can convert a Word32 to [Word8] per

import Data.Word
import Data.Bits

octets :: Word32 -> [Word8]
octets w = 
    [ fromIntegral (w `shiftR` 24)
    , fromIntegral (w `shiftR` 16)
    , fromIntegral (w `shiftR` 8)
    , fromIntegral w
    ]

for big-endian conversion (for little-endian, reverse the list ;)
And a list of Word8 to a Word32 per

fromOctets :: [Word8] -> Word32
fromOctets = foldl' accum 0
  where
    accum a o = (a `shiftL` 8) .|. fromIntegral o

(reverse or use foldr for little-endian conversion).

>
> I have also had this problem in the past with Word8's, Octets and Chars.
> Beyond this specific problem, what is a good strategy for figuring out
> how to do these conversions in the future?  Sometimes I find a function
> somewhere, but I have never found a general way to deal with it.



------------------------------

Message: 2
Date: Wed, 13 Oct 2010 14:52:58 +0200
From: Jeroen van Maanen <[email protected]>
Subject: Re: [Haskell-beginners] Flying Dutchman sailing blind
To: Daniel Fischer <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

Thanks again. Problem solved,... well, under control. ;-)

Op 2010-10-05 12:07, Daniel Fischer wrote:
> On Tuesday 05 October 2010 10:06:03, Jeroen van Maanen wrote:
>> A few weeks ago I was greatly helped by members of this list to expose
>> an error that was hiding in my, by now quite extensive, Haskell program.
>> I should still thank Daniel Fisher for his lucid explanation of how an
>> overdose of strictness can prevent a try-expression from wrapping an
>> exception.
>>
>> Now that the error is gone, I face a new challenge: when I let my
>> program run for a while (it is a computationally intensive task that
>> never stops by design), after fifteen minutes or so the CPU usage drops
>> from nearly 100% to around 15% and a few minutes later the process dies
>> with the message:
>>
>>   "lexau: memory allocation failed (requested 2097152 bytes)"
>>
>> The whole thing is a tail biting snail pit of threads that are
>> communicating through MVars. Every thread runs a tail recursive
>> function. (I read in another post that it is not a good idea to use
>> explicit recursion, but when I compared alternatives the fold variants
>> produced even worse stack/heap problems.)
> 
> That depends. Using a combinator (folds etc.) gives you more elegant, more 
> general and usually more readable code.
> Using a low-level direct recursion however gives you more control over 
> strictness, speed and allocation behaviour.
> Sometimes you have to stick your hands in the low-level details to get 
> adequate performance.
> 
>> The thread that is likely to
>> cause the problem is an optimizer that tries many possible improvements
>> of a complex data structure and incrementally applies the successful
>> ones. I use a strict foldl style in an attempt to limit the memory used
>> by the optimizer.
> 
> May be not strict enough (or too strict). What's the suspect function?

I spent most of the time looking in the wrong module. It turned out to be a 
function that prepares an 'oracle' for my optimizer that was not strict enough.

>> Frankly, however, I have no idea what eats up so much heap space.
>>
>> Now I have always proudly labeled myself a 'printf programmer', but I am
>> afraid that I am going to need some profiling tool to determine where
>> the problem is. Any suggestions where I should start?
> 
> As a very first measure, run your programme with the "-hT" RTS-option (
> $ ./lexau +RTS -hT -RTS args
> -- wait until it dies or ctrl-C it when CPU usage has dropped
> $ hp2ps -c lexau.hp
> -- look at the graph in lexau.ps
> ).

It was kind of hard to interpret. In the end it was the huge amount of 4-tuples 
that got me on track. I converted the 4-tuples in two modules to data types 
(with different constructors). That allowed me to see where the memory went.

> If that doesn't reveal the source of the problem, compile for profiling,
> 
> $ ghc -O2 -prof -auto-all -osuf p_o -hisuf p_hi --make lexau -o profLexau
> 
> and run with some heap profiling options,
> http://darcs.haskell.org/download/docs/6.12.3/html/users_guide/profiling.html
> explains how.
> 
>>
>> Cheers,  Jeroen
>>
>> P.S. For an idea of what is living in the snail pit, have a look at:

I had meant to write "snake pit", but given the lack of speed and the amount of 
trailing goo I guess "snail pit" is even more accurate. 

>> http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Pipeline/Concurrent.hs?view=markup
>> http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Model/HistoryTree.hs?view=markup
> 
> Ewww.

So in fact the culprit turned out to be the function updateRationals in the 
module

  
http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/Distribution/MDL.hs?view=markup

It is still eating more time than the actual optimizer, so suggestions for 
improvement are still welcome.

>> (I know, HistoryTree.hs badly needs to be split up into smaller
>> modules.)
> 
> + 5

That is going to be my next focus.


------------------------------

Message: 3
Date: Wed, 13 Oct 2010 15:53:27 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Flying Dutchman sailing blind
To: Jeroen van Maanen <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="iso-8859-1"

On Wednesday 13 October 2010 14:52:58, Jeroen van Maanen wrote:
> So in fact the culprit turned out to be the function updateRationals in
> the module
>
>  
> http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/D
>istribution/MDL.hs?view=markup
>
> It is still eating more time than the actual optimizer, so suggestions
> for improvement are still welcome.

First,

approximateRational :: Double -> Rational
approximateRational x =
    let (m, e) = decodeFloat x
    in if e >= 0
        then (m * (2 ^ e)) % 1
        else m % (2 ^ (-e))

is exactly toRational, so there's no need for that function. Also, there's 
a much faster implementation of toRational (for Double and Float) underway, 
I think it will be in the first GHC 7 release due in a couple of weeks.
Anyway, using toRational will let you profit from that with no additional 
effort when you get a compiler with that patch.

Second,

data Threshold =
  Threshold
    { theBoundA :: Rational
    , theBoundB :: Rational
    , theCountA :: Integer
    , theCountB :: Integer
    }
  deriving Show

I have not looked much at the code, but it seems likely that you will want 
strict fields there,

data Threshold =
  Threshold
    { theBoundA :: !Rational
    , ...
    }

but that's to be tested later.

Third,

mapToThresholds :: [Threshold] -> [Rational] -> [(Rational, Rational)]
mapToThresholds _ [] = []
mapToThresholds thresholds@((Threshold boundA boundB intA intB) : 
moreThresholds) rationals@(x : moreRationals)
  | x > boundB = mapToThresholds moreThresholds rationals
  | x > boundA =
     let width = boundB - boundA
         count = fromInteger (intB - intA)
         mapped = (((x - boundA) * count) / width) + (fromInteger intA)
     in (mapped, x) : mapToThresholds thresholds moreRationals
  | True = error $ "Rational is too small: " ++ (show x) ++ " < " ++ (show 
boundA)
mapToThresholds [] (x : _) = error $ "Rational is too big: " ++ (show x)

will probably profit from making mapped strict,

    let ...
    in mapped `seq` (mapped, x) : mapToThreholds ... 

Now updateRationals:

updateRationals :: Integer -> [(Integer, Rational)] -> Integer -> 
[(Integer, Rational)]
updateRationals previousWeight previousRationals w

             else let mapped = mapToThresholds thresholds boundaries
                      mappedIntervals = zip ((0, 0) : mapped) mapped
                      ((_, a), (_, b)) = foldl1' maxMappedInterval 
mappedIntervals


That's no good, unfortunately.

maxMappedInterval :: ((Rational, Rational), (Rational, Rational)) -> 
((Rational, Rational), (Rational, Rational)) -> ((Rational, Rational), 
(Rational, Rational))
maxMappedInterval ((ma, a), (mb, b)) ((mc, c), (md, d)) =
  if md - mc > mb - ma
    then ((mc, c), (md, d))
    else ((ma, a), (mb, b))


foldl1' evaluates the result of maxMappedInterval to weak head normal form, 
that is to the outermost constructor.
Depending on what the optimiser does, that may or may not evaluate the 
condition md - mc > mb - ma, but it will *not* look at a, b, c, d, and at 
least the second components of the inner pairs happily build thunks, 
possibly keeping references to the elements of the list already processed, 
so keeping stuff from being garbage collected.

What you need is a strict type to contain your Rationals,

data SRQ = SRQ !Rational !Rational !Rational !Rational

maxMappedInterval :: SRQ -> ((Rational,Rational)) -> SRQ
maxMappedInterval s@(SRQ ma a mb b) ((mc,c),(md,d))
  | mb - ma < md - mc = SRQ mc c md d
  | otherwise = s

Then the foldl1' will evaluate all components and you don't get thunks or 
space leaks.



------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 28, Issue 23
*****************************************

Beginners Digest, Vol 28, Issue 23

Reply via email to