Beginners Digest, Vol 22, Issue 49

beginners-request Thu, 29 Apr 2010 16:28:07 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  Re: Iterating through a list of char... (Stephen Tetley)
   2. Re:  Re: Iterating through a list of char... (Daniel Fischer)
   3. Re:  Re: Converting an imperative program to      haskell (Hein Hundal)
   4. Re:  Re: Converting an imperative program to      haskell
      (Daniel Fischer)
   5. Re:  [Int] oddness with summing large lists (Philip Scott)
   6.  Data.Binary.Get for large files (Philip Scott)
   7. Re:  Re: Iterating through a list of char... (Daniel Fischer)


----------------------------------------------------------------------

Message: 1
Date: Thu, 29 Apr 2010 22:16:47 +0100
From: Stephen Tetley <[email protected]>
Subject: Re: [Haskell-beginners] Re: Iterating through a list of
        char...
Cc: [email protected]
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

On 29 April 2010 21:39, Jean-Nicolas Jolivet <[email protected]> wrote:

[Snip]

> Every solution I found was iterating to a list of character one character at
> a time, and outputting a character as a result (whether it is by recursion,
> mapping, etc..), however, what I was trying to explain in my previous post
> was that; when I am processing the escape character, this character should
> not be added to the resulting string... This is where I decided to use the
> Maybe monad..I'm just wondering if there is a better way...!

Hi Jean-Nicolas

The 'hand crafted lexer' style is quite common for this, with
different functions for different states (here body and escape).

I use a Hughes list as an explicit accumulator (a Hughes list is
sometimes called a difference list and familiar for strings as the
ShowS type synonym). This has more efficient right-addition (snoc)
than normal lists [a].


> toString :: ShowS -> String
> toString = ($ [])

> snoc :: ShowS -> Char -> ShowS
> snoc f a = f . (a:)

> emptyH :: ShowS
> emptyH = id

> escape :: String -> ShowS -> ShowS
> escape (_:xs) acc = body xs acc   -- drop1, change state
> escape _      acc = acc           -- oh dear escape to the end-of-string

> body :: String -> ShowS -> ShowS
> body ('\\':xs) acc = escape xs acc  -- change state
> body (x:xs)    acc = body xs (acc `snoc` x)
> body _         acc = acc


> run :: String -> String
> run xs = toString (body xs emptyH)

> demo1 = run "\\s1234"   -- "1234"

Best wishes

Stephen


------------------------------

Message: 2
Date: Thu, 29 Apr 2010 23:23:52 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Re: Iterating through a list of
        char...
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="utf-8"

Am Donnerstag 29 April 2010 22:39:05 schrieb Jean-Nicolas Jolivet:
> Every solution I found was iterating to a list of character one
> character at a time, and outputting a character as a result (whether it
> is by recursion, mapping, etc..),

Well, that's necessary for what you want to do, isn't it?

> however, what I was trying to explain
> in my previous post was that; when I am processing the escape character,
> this character should not be added to the resulting string... This is
> where I decided to use the Maybe monad..I'm just wondering if there is a
> better way...!

Depends on your criteria.
For raw performance, I think the directly coded recursion is a bit better

(btw,

fooGen e esc norm = go False
    where
        go b (x:xs)
          | x == e    = go True xs
          | b         = esc x : go False xs
          | otherwise = norm x : go False xs
        go _ [] = []
).

For code cleanliness, catMaybes&zipWith and a good directly coded recursion 
are about equal.

For leetness, the foldr is beats them, but surely one could do much 
'better'.


------------------------------

Message: 3
Date: Thu, 29 Apr 2010 14:49:16 -0700 (PDT)
From: Hein Hundal <[email protected]>
Subject: Re: [Haskell-beginners] Re: Converting an imperative program
        to      haskell
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

--- On Thu, 4/29/10, Maciej Piechotka <[email protected]> wrote:
> Hein Hundal wrote:
> >
> >    I figured I should try a larger program
> in Haskell, so I am
> > converting one of my Mathematica programs, a simulator
> for the
> > card game Dominion, over to Haskell.  Most of
> that is going well
> > except for one patch of imperative code.  My
> Haskell version of
> > this code is ugly.  I was hoping someone could
> recommend a better
> > way to do it.  I will paste a simplified version
> of the code
> > below.  If necessary, I can provide all the other
> code used for
>
> 1. Use strong typing. Or any typing

I simplified the code for the post.  In the real version, I use strong typing.  
The Card type is enumerated.  I have been using [Card] instead of calling it a 
Deck.  I could change that.

> 2. User guards instead of nested if's:

In the "procloop" function, I have if's nested quite deeply.  I could easily 
use guards.  I will try that.

> 3. Don't use too much variables. 6-8 is probably the
> maximum you should
> deal with (human short-term memory holds about 5-10 points
> of entry.
> Split functions into smaller functions (even in where).

I do have to get the information into the functions, so the only way I can 
avoid having lots of variables is by introducing new structures.  I can do that.

> 4. Discard result you don't need:
>
> isDone  (LSt gs strat iA iTh i aA iB iC bDone) =
> bDone
> isDone  (LSt _ _ _ _ _ _ _ _ _ bDone) = bDone

Yes, that's much better. 

proc gs' strat = let
      iA     = 1
      iTh    = 0
      i      = 0
      aA     = replicate 20 0
      iB     = 1
      iC     = 0
      bDone  = False
      gs     = apps ("<<<"++show(gs')++">>>") gs'
      lst    = LSt gs strat iA iTh i aA iB iC bDone
      lst2   = until isDone procloop lst 
      isDone  (LSt _  _ _ _ _ _ _  _ bDone) = bDone
      output  (LSt gs _ _ _ _ aA iB iC _) = (iB, iC, appd aA gs)
  in output lst2


> 5. Don't reuse the same name in the same scope (like iA as
> 1 and iA as parameter to isDone).

I did hesitate to use the same parameter name twice. 

> 6. While Haskell have long tradition of having short
> namesit is not
> always good (see 3). Use them only if you are sure it is
> clear what they
> mean:

In the original version, I had longer variable names where they seemed 
necessary. 


The main sources of ugliness are the long lists of variables.  Every time I 
call doAct or construct a LoopState variable, I am repeating all those 
variables.  I will try changing the type of doAct to

doAct :: LoopState -> LoopState

Cheers,
Hein







      


------------------------------

Message: 4
Date: Fri, 30 Apr 2010 00:16:20 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Re: Converting an imperative program
        to      haskell
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="iso-8859-1"

Am Donnerstag 29 April 2010 23:49:16 schrieb Hein Hundal:
> --- On Thu, 4/29/10, Maciej Piechotka <[email protected]> wrote:
> > Hein Hundal wrote:
> > >    I figured I should try a larger program
> >
> > in Haskell, so I am
> >
> > > converting one of my Mathematica programs, a simulator
> >
> > for the
> >
> > > card game Dominion, over to Haskell.  Most of
> >
> > that is going well
> >
> > > except for one patch of imperative code.  My
> >
> > Haskell version of
> >
> > > this code is ugly.  I was hoping someone could
> >
> > recommend a better
> >
> > > way to do it.  I will paste a simplified version
> >
> > of the code
> >
> > > below.  If necessary, I can provide all the other
> >
> > code used for
> >
> > 1. Use strong typing. Or any typing
>
> I simplified the code for the post.  In the real version, I use strong
> typing.  The Card type is enumerated.  I have been using [Card] instead
> of calling it a Deck.  I could change that.
>

Whether you use [Card],

type Deck = [Card]

or

newtype Deck = D [Card]

is not important. Using enumerations for suite and value instead of Ints 
is.

> > 2. User guards instead of nested if's:
>
> In the "procloop" function, I have if's nested quite deeply.

Nested ifs (beyond, say, three levels) are a pain to decipher. Using guards 
(and maybe 'case's) will probably be a great improvement.

> I could easily use guards.  I will try that.
>
> > 3. Don't use too much variables. 6-8 is probably the
> > maximum you should
> > deal with (human short-term memory holds about 5-10 points
> > of entry.
> > Split functions into smaller functions (even in where).
>
> I do have to get the information into the functions, so the only way I
> can avoid having lots of variables is by introducing new structures.
> I can do that.
>

As long as they're meaningful. Just smashing a couple of values together 
into a structure to reduce the variable count isn't good.

> > 4. Discard result you don't need:
> >
> > isDone  (LSt gs strat iA iTh i aA iB iC bDone) =
> > bDone
> > isDone  (LSt _ _ _ _ _ _ _ _ _ bDone) = bDone
>
> Yes, that's much better.

Perhaps use named field syntax,

data LoopState
    = LSt
    { gameState :: GameState
    , strat :: Strategy
    , ...
    , isDone :: Bool
    }

Then in procloop, where you just update one or two fields,

procloop lst@(LSt gs' strat iA iTh i aA iB iC bDone)
    | iA<=0 || i>=20 || actcd gs <=0 || iCd == -1
        = lst{ idx = idx lst + 1, isDone = True }
    | iCd == 1
        = lst{ idx = idx lst + 1, iThFld = iThFld lst + 1, isDone = False }
    | iThFld lst > 0
        = lst{ ... }
    | otherwise
       = lst{ ... }
      where
        iCd = stratchoose ...
        -- other let-bindings, only those needed will be evaluated

>
> proc gs' strat = let
>       iA     = 1
>       iTh    = 0
>       i      = 0
>       aA     = replicate 20 0
>       iB     = 1
>       iC     = 0
>       bDone  = False
>       gs     = apps ("<<<"++show(gs')++">>>") gs'
>       lst    = LSt gs strat iA iTh i aA iB iC bDone
>       lst2   = until isDone procloop lst
>       isDone  (LSt _  _ _ _ _ _ _  _ bDone) = bDone
>       output  (LSt gs _ _ _ _ aA iB iC _) = (iB, iC, appd aA gs)
>   in output lst2
>
> > 5. Don't reuse the same name in the same scope (like iA as
> > 1 and iA as parameter to isDone).
>
> I did hesitate to use the same parameter name twice.
>
> > 6. While Haskell have long tradition of having short
> > namesit is not
> > always good (see 3). Use them only if you are sure it is
> > clear what they
> > mean:
>
> In the original version, I had longer variable names where they seemed
> necessary.
>
>
> The main sources of ugliness are the long lists of variables.  Every
> time I call doAct or construct a LoopState variable, I am repeating all
> those variables.  I will try changing the type of doAct to
>
> doAct :: LoopState -> LoopState

That looks promising.

>
> Cheers,
> Hein



------------------------------

Message: 5
Date: Thu, 29 Apr 2010 23:17:05 +0100
From: Philip Scott <[email protected]>
Subject: Re: [Haskell-beginners] [Int] oddness with summing large
        lists
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi ho,
> I can deduce that you have a 32-bit system (as I do).
> Because:
>
> Prelude>  1000000000000 :: Int
> -727379968
>
> So ([1 .. 1000000000000] :: [Int]) == [] and sum [] == 0 isn't surprising
> at all.
>
>    

Once again, your explanation makes perfect sense :)

>> And now the really odd part... take away one zero
>>
>> Prelude>  let d = [1..100000000000] :: [Int]
>>      
> Prelude>  100000000000 :: Int
> 1215752192
>
> , which is a pretty large number. Since you gave the list a name, it stays
> in memory. For each Int in the list, at least 3 machine words are needed
> (one for the Int, pointer from cons-cell to value, pointer to next cons-
> cell; actually, I think the overhead is larger), so if you have less than
> 14GB of RAM, your ghci can't do it.
>
>    

Fair enough, I don't think my laptop is quite up to that :)

Thanks for your help,

- Philip


------------------------------

Message: 6
Date: Thu, 29 Apr 2010 23:37:59 +0100
From: Philip Scott <[email protected]>
Subject: [Haskell-beginners] Data.Binary.Get for large files
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"

Hello again folks,

Sorry to keep troubling you - I'm very appreciative of the help you've 
given so far. I've got one more for you that has got me totally stumped. 
I'm writing a program which deals with largish-files, the one I am using 
as a test case is not stupidly large at about 200mb. After three 
evenings, I have finally gotten rid of all the stack overflows, but I am 
unfortunately left with something that is rather unfeasably slow. I was 
hoping someone with some keener skills than I could take a look, I've 
tried to distill it to the simplest case.

This program just reads in a file, interpreting each value as a double, 
and does a sort of running average on them. The actual function doesn't 
matter too much, I think it is the reading it in that is the problem. 
Here's the code:

import Control.Exception
import qualified Data.ByteString.Lazy as BL
import Data.Binary.Get
import System.IO
import Data.Binary.IEEE754

myGetter acc = do
     e <- isEmpty
     if e == True
         then
             return acc
         else do
             t <- getFloat64le
             myGetter $! ((t+acc)/2)

myReader file = do
     h <- openBinaryFile file ReadMode
     bs <- BL.hGetContents h
     return $ runGet (myGetter 0)  bs

main = do
     d <- myReader "data.bin"
     evaluate d

This takes about three minutes to run on my (fairly modern) laptop.. The 
equivilant C program takes about 5 seconds.

I'm sure I am doing something daft, but I can't for the life of me see 
what. Any hints about how to get the profiler to show me useful stuff 
would be much appreciated!

All the best,

Philip

PS: If, instead of computing a single value I try and build a list of 
the values, the program ends up using over 2gb of memory to read a 200mb 
file.. any ideas on that one?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://www.haskell.org/pipermail/beginners/attachments/20100429/5862ff1a/attachment-0001.html

------------------------------

Message: 7
Date: Fri, 30 Apr 2010 01:27:54 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Re: Iterating through a list of
        char...
To: "Jean-Nicolas Jolivet" <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="iso-8859-1"

Am Donnerstag 29 April 2010 23:13:21 schrieb Jean-Nicolas Jolivet:
> > -- assuming that only characters > 'i' (chr 105) are escaped (or the
> > escape character itself, but that should be dropped regardless).
>
> Sorry, this is my fault for not being clearer about it: I am decoding
> text that is already encoded with a Binary to ASCII encoding... the
> exact encoding process is this:
>
>  1. Fetch a character from the input stream.
>  2. Increment the character's ASCII value by 42, modulo 256
>  3. If the result is a critical character ('\NUL', '\r', '\n' or '='),
> write the escape character ('=') to the output stream and increment
> character's ASCII value by 64, modulo 256. 4. Output the character to
> the output stream.

Okay, so there are no two escape characters in succession.

>
> I am writing a decoder here so obviously I am reversing the process....
> (I remove 42 for regular characters, 106 for special characters...and if
> the result is < 0, I add 256 to it...)
>
> Just adding (yet again) more context information here ;) Still trying to
> fully understand your last suggestions Daniel! :) Thanks again!
>
> Also, I wouldn't want anyone to think I just want someone to write the
> algorithm for me! :)  Like I said, I already have a fully working
> algorithm, but being new to Haskell, I'm looking for ways to optimize it
> (or, really just different ways that I could do it!)
>
> Here's how I am doing it right now (I am using ByteString here,

Then the foldr is not the best option.

> but it should be pretty straightforward anyway)...
> I'm warning you, this is pretty ugly :)
>
> -- Decode an encoded ByteString
> -- the zip + tail method removes the first character so I am adding it
> afterward (ugly)
> decodeByteString :: L.ByteString -> L.ByteString
> decodeByteString str = do
>   let str1 = mapMaybe decodepair (L.zip str(L.tail str))
>   let firstChar = decodechar (ord(L.head str) - 42)
>   L.pack (firstChar:str1)

Is it really necessary to pack it? That's a relatively expensive operation, 
it may be better to have it return a String if you're not doing anything 
with it after decoding except writing it to a file or so.

The zipping is also not the optimal choice, it takes a lot of checking for 
the chunk-boundaries (I assume you're using lazy ByteStrings, since you 
chose the prefix L), and you construct pairs only to deconstruct them 
immediately (the compiler *may* optimise them away, but I'm skeptical).

>
> -- Decode a pair of character, returning either the
> -- decoded character or Nothing
> decodepair :: (Char, Char) -> Maybe Char
> decodepair cs
>
>   | snd(cs) == '='  = Nothing
>   | fst(cs) == '='  = Just (decodechar(ord(snd cs) - 106))
>   | otherwise       = Just (decodechar(ord(snd cs) - 42))
>
> -- Reverse the modulo 256...
> decodechar :: Int -> Char
> decodechar i
>
>   | i < 0 = chr (i + 256)
>   | otherwise = chr i

Since you're doing arithmetic modulo 256, that stuff can be done faster and 
simpler with Word8.

----------------------------------------------------------------------
{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import Data.ByteString.Unsafe (unsafeAt)

escape :: Word8 -> Word8
escape = (+150)

normal :: Word8 -> Word8
normal = (+214)

decodeW :: L.ByteString -> [Word8]
decodeW = dec False . L.toChunks
    where
      dec _ [] = []
      dec esc (str:more) = go esc 0
        where
          !len = S.length str
          {-# INLINE charAt #-}
          charAt :: Int -> Word8
          charAt i = unsafeAt str i
          go !b !i
            | i == len  = dec b more
            | b         = escape (charAt i) : go False (i+1)
            | otherwise = case charAt i of
                            61 -> go True (i+1)
                            c  -> normal c : go False (i+1)

word8ToChar :: Word8 -> Char
word8ToChar = toEnum . fromIntegral

decodeC :: L.ByteString -> String
decodeC = map word8ToChar . decodeW

decodeBS :: L.ByteString -> L.ByteString
decodeBS = L.pack . decodeW
----------------------------------------------------------------------
>
> Jean-Nicolas Jolivet
>
> On 2010-04-29, at 4:50 PM, Daniel Fischer wrote:
> > Am Donnerstag 29 April 2010 21:37:15 schrieb Jean-Nicolas Jolivet:
> >> First I would like to thank everyone for the very interesting replies
> >> and suggestions I got so far!...
> >>
> >> I tried to implement (and at the very least understand) most of
> >> them!...
> >>
> >> To add to the context here, what I am trying to do is:
> >>
> >> -apply a "transformation" to a character (in my case, subtracting 42
> >> to its ASCII value, which I obtain with chr(ord(c) - 42) -if the
> >> character is preceded by a specific character (that would be, an
> >> escape character, in this case '=') then subtract 106 to its value
> >> instead of 42... -if the character is the escape character itself,
> >> '=',  then skip it altogether
> >
> > Ah, that complicates matters a little.
> > - What happens if ord c < 42 (ord c < 106, if c is preceded by the
> > escape character?)
> > - What about escaped escape characters?
> >
> > However,
> >
> > foo xs = catMaybes $ zipWith f (' ':xs) xs
> >    where
> >        f _ '=' = Nothing
> >        f '=' c = Just (chr $ ord c - 106)
> >        f _ c   = Just (chr $ ord c - 42)
> >
> > is still pretty simple, as is the direct recursion
> >
> > foo = go ' '
> >    where
> >        go _ ('=' :cs) = go '=' cs
> >        go '=' (c:cs)  = chr (ord c - 106) : go c cs
> >        go _ (c:cs)    = chr (ord c - 42) : go c cs
> >        go _ _         = []
> >
> > -- assuming that only characters > 'i' (chr 105) are escaped (or the
> > escape character itself, but that should be dropped regardless).
> >
> > fooGen :: Char -> (Char -> Char) -> (Char -> Char) -> String -> String
> > fooGen e esc norm str = catMaybes $ zipWith f (d:str) str
> >    where
> >        d = if e == maxBound then pred e else succ e
> >        f x y
> >
> >          | y == e    = Nothing
> >          | x == e    = Just (esc y)
> >          | otherwise = Just (norm y)
> >
> > is an easy generalisation.
> >
> >> (keeping in mind that the next character needs to be
> >> escaped)...
> >>
> >> I managed to do it, however I'm not totally satisfied in the way I
> >> did it... the problem was that... as I just explained, in some cases,
> >> the character that is being processed has to be "skipped" (and by
> >> that I mean, not added to the resulting string). This happens when
> >> the processed character IS the escape character...
> >>
> >> What I did was to build a List of Maybe Char.... my function does the
> >> proper operation on the character and returns a "Just Char" when the
> >> character is processed, or Nothing when it is the escaped
> >> character... so basically I would end up with something like:  [Just
> >> 'f', Just 'o', Just 'o', Nothing]... I am mapping this using mapMaybe
> >> to end up with a proper String...
> >>
> >> Would there be any more efficient way of doing this?
> >
> > That is already pretty efficient. The direct recursion is probably a
> > bit more efficient, but I don't think the difference will be large.
> >
> >> Considering that
> >> the escape character should NOT be added to the resulting string, is
> >> there any way I can avoid using the Maybe monad?
> >
> > Sure, apart from the direct recursion,
> >
> > fooGen e esc norm str = tail $ foldr f [] (d:str)
> >    where
> >        d = if e == maxBound then pred e else succ e
> >        f x (y:zs)
> >
> >          | y == e    = x:zs
> >          | x == e    = x:esc y:zs
> >          | otherwise = x:norm y:zs
> >
> >          f x [] = [x]
> >
> > catMaybes and zipWith is clearer, though, and I don't think the foldr
> > will perform better.
> >
> >> Once again, thanks everyone for all the suggestions!
> >>
> >> Jean-Nicolas Jolivet



------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 22, Issue 49
*****************************************

Beginners Digest, Vol 22, Issue 49

Reply via email to