Beginners Digest, Vol 9, Issue 11

beginners-request Tue, 10 Mar 2009 09:46:50 -0700

Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org

You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."

Today's Topics:

   1.  Re: folds -- help! (7stud)
   2.  Simple data summarization (Andy Elvey)
   3.  Re: installing cabal with 6.10 - ubuntu (Christian Maeder)
   4. Re:  Re: folds -- help! (Kurt Hutchinson)
   5.  Re: installing cabal with 6.10 - ubuntu (B)
   6. Re:  Simple data summarization (Roland Zumkeller)
   7. Re:  Simple data summarization (Thomas Davie)
   8. Re:  Simple data summarization (Patrick LeBoutillier)

----------------------------------------------------------------------

Message: 1
Date: Tue, 10 Mar 2009 08:01:19 +0000 (UTC)
From: 7stud <bbxx789_0...@yahoo.com>
Subject: [Haskell-beginners] Re: folds -- help!
To: beginners@haskell.org
Message-ID: <loom.20090310t075932-...@post.gmane.org>
Content-Type: text/plain; charset=us-ascii

Peter Verswyvelen <bugfact <at> gmail.com> writes:
>
> Does this help?
> 

Sorry, I'm only on chapter 4 of RWH, and I don't understand that notation.

------------------------------

Message: 2
Date: Tue, 10 Mar 2009 21:33:00 +1300
From: Andy Elvey <andy.el...@paradise.net.nz>
Subject: [Haskell-beginners] Simple data summarization
To: beginners@haskell.org
Message-ID: <49b625bc.7060...@paradise.net.nz>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi all - 

In the process of learning Haskell I'm wanting to do some simple data 
summarization.
( Btw, I'm looking at putting any submitted code for this in the 
"cookbook" section of
the Haskell wiki.  Imo it would be very useful there as a "next step" up 
from just reading
in a file and printing it out.  ) 

This would involve reading in a delimited file like this - ( just a 
contrived example of how many books
some people own ) -

Name,Gender,Age,Ethnicity,Books
Mary,F,14,NZ European, 11
Brian,M,13,NZ European, 6
Josh,M,12,NZ European, 14
Regan,M,14,NZ Maori, 9
Helen,F,15,NZ Maori, 17
Anna,F,14,NZ European, 16
Jess,F,14,NZ Maori, 21

.... and doing some operations on it. 
As you can see, the file has column headings - I prefer to be able to 
manipulate data with
headings (as it is what I do a lot of at work, using another programming 
language).

I've tried to break the problem down into small parts as follows. 
a) Read the file into a list of pairs.
The first element of the pair would be the column heading.
The second will be a list containing the data.
For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )   

b) Select a numeric variable to summarise ( "Books" in this example) 
c) Do a fold to summarize the variable. I think a left-fold would be the 
one to use here, but I may
be wrong....

After looking through previous postings on this list, I found some code 
which is somewhat similar to what I'm after (although the data it was 
crunching is very different).  This is what I've come up with so far -

summarize [] = []
summarize ls = let
        byvariable = head ls
        numeric_variable = last ls
        sum = foldl (+) 0 $ numeric_variable

    in (byvariable, sum) : sum ls

main = interact (unlines . map show . summarize . lines) 

I think this might be a useful start, but I still need to read the data 
into a list of pairs as mentioned, and I'm unsure as to how to
do that. 

Many thanks in advance for any help received.  As mentioned, I'm sure 
that examples like this could be very useful to other beginners, so I'm 
keen to make sure that any help given is made maximum use of (by putting 
any code on the Haskell wiki). 
- Andy

------------------------------

Message: 3
Date: Tue, 10 Mar 2009 09:49:39 +0100
From: Christian Maeder <christian.mae...@dfki.de>
Subject: [Haskell-beginners] Re: installing cabal with 6.10 - ubuntu
To: B <bburde...@comcast.net>
Cc: beginners@haskell.org
Message-ID: <49b629a3.5070...@dfki.de>
Content-Type: text/plain; charset=ISO-8859-1

B wrote:
> I found some directions online that said that compiling from source
> might be a good way to go.  I downloaded and compiled ghc 6.10.1,
> installing it locally.

Next time include the extra-libraries in your sources, to avoid
installing so many packages afterwards.

> Ok.  Now when I go to build network I get this:
> Could not find module `Data.Generics':
>       it is a member of package base-3.0.3.0, which is hidden
> 
> According to google, the above message actually indicates that 'syb' is
> needed, the bug is here:
> http://hackage.haskell.org/trac/ghc/ticket/2980
> 
> - installed syb-0.1.0.0.
> - still get the same message.
> 
> At this point I'm stuck!

Are you sure you have version 2.2.0.1
http://hackage.haskell.org/packages/archive/network/2.2.0.1/network-2.2.0.1.tar.gz?
This package does not need syb as dependency.

The file Network/URI.hs contains:
#ifdef BASE4
import Data.Data      ( Data )
#else
import Data.Generics  ( Data )
#endif

and network.cabal sets the appropriate flag:
  if flag(base4)
      build-depends:    base>=4
      cpp-options:      -DBASE4

> I should say that I'm using this to install all these things:
> 
> runhaskell Setup configure --user
> runhaskell Setup build
> runhaskell Setup install
> 
> So everything is installed 'locally' and not globally.

This should be fine.

HTH Christian

------------------------------

Message: 4
Date: Tue, 10 Mar 2009 08:50:49 -0400
From: Kurt Hutchinson <kelansli...@gmail.com>
Subject: Re: [Haskell-beginners] Re: folds -- help!
To: 7stud <bbxx789_0...@yahoo.com>
Cc: beginners@haskell.org
Message-ID:
        <ad4a86720903100550r8edf090xcee3ffc7bb14...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Mar 10, 2009 at 3:59 AM, 7stud <bbxx789_0...@yahoo.com> wrote:
> -------------
> Like foldl, foldr takes a function and a base case(what to do when the input
> list is empty) as arguments.
> -------------
>
> That also does not seem correct.  For example:
>
> foldrSum xs =  foldr accFunc 0 xs
>    where accFunc x acc = acc + x
>
> *Main> foldrSum [1, 2, 3]
> 6
>
> In that example, the first two arguments to foldr are the function accFunc
> and 0.  It does not seem accurate to say that  "0 is what to do when the
> input list is empty".   What foldr  does when the input list is empty
> is return the value of the acc parameter variable:
>

I'm not sure why they explain the base case for fold in that way. At
least to me, that is only a trivial result of the 'zero' value's main
purpose, which is to be the initial value used in the accumulating
fold function. When starting the fold, we have to start somewhere, so
the accFunc needs a seed value. That value, and the first element of
the list, are fed into accFunc to start things off. Then it just so
happens that if the list is empty, the seed value is the result, since
no folding can happen.

> foldr _ acc [] = acc
>
> In my example, the value of the acc parameter is 6 "when the input list is
> empty"--not the value 0, which is the argument to foldr.

You're thinking of a slightly different 'empty' here. You're thinking
of what happens when you reach the end of the list, after folding it
all, and there are no more elements to fold. In this example, you're
right that the acc parameter is 6. But what if the list you *first
gave to foldr* was empty? Then it would evaluate to 0, the initial
seed value.

Now you may be thinking, "Why would I ever apply foldr to an empty
list? Obviously that would do nothing." Well you may not know whether
a list is empty, if it's the result of other calculations.

You also may be thinking, "Why do I need to provide a seed value, why
can't foldr just start with the first two elements of the list?"
That's because the accFunc does not always evaluate to the same type
as the elements in the list. For example, you could use a fold to
count the number of 'a's in a list of characters. Then the type of
accFunc would be "accFunc :: Int -> Char -> Int" for foldl or "accFunc
:: Char -> Int -> Int" for foldr. It takes one element from the list,
the previous result, and evaluates to a new result. But in the very
first fold step, there's no previous result, so you have to provide
one.

It's called zero as a convention, but it doesn't actually have to *be*
zero. It can be any initial value you want. Make your foldSum function
only evaluate to results of 10 or larger by doing this:

foldSum xs = foldr  accFunc 10 xs
  where accFunc x acc = acc + x

[---snip---]

>> if f can start delivering the result without looking at its second argument,
>> you can start consuming the result before the fold has traversed the whole
>> list.
>>
>
> Ok, that isn't clearly illustrated by the example in the book:
>
>> > foldl (+) 0 (1:2:3:[])
>> >    == foldl (+) (0 + 1)             (2:3:[])
>> >    == foldl (+) ((0 + 1) + 2)       (3:[])
>> >    == foldl (+) (((0 + 1) + 2) + 3) []
>> >    ==           (((0 + 1) + 2) + 3)
>> >
>> >
>> > foldr (+) 0 (1:2:3:[])
>> >    ==  1 +           foldr (+) 0 (2:3:[])
>> >    ==  1 + (2 +      foldr (+) 0 (3:[])
>> >    ==  1 + (2 + (3 + foldr (+) 0 []))
>> >    ==  1 + (2 + (3 + 0))
>> >
>
> In that example, it doesn't look like anything in foldr can be evaluated
> until the whole fold has been completed.

You're right, that example doesn't show how you could start using the
result without fully evaluating the fold, since addition doesn't give
partial results. The concat example is better in that regard.

>> Common examples are things like
>>
>> concat = foldr (++) [],
>> so
>> concat [l1,l2,l3,l4,l5] = l1 ++ (foldr (++) [] [l2,l3,l4,l5])
>> and the start (l1) can be used before further reducing the fold,
>>
>
> So does haskell store a thunk for everything to the right of l1?
> You said that when using foldr you can start "consuming" the beginning of the
> result before the whole result is reduced.  I don't quite get that.

A thunk is used as a stand-in for most calculations before the result
is actually calculated. That way, if you never try to use the result,
the calculation never needs to be done, and that means less work. As
an example that ties to the concat example above, say your program
only wanted to test if the result of the concat fold was an empty
list. The function 'null' takes a list and evaluates to True or False,
based on whether the list is empty or not. So:

someFunc xs = null ( concat xs )
  where
  concat ys = foldr (++) [] ys

The 'null' function only needs to test whether the list that is the
result of foldConcat has at least one element. Let's say l1 has an
element. So it's kind of evaluated like this:
  someFunc [ l1, l2, l3, l4, l5 ]
  null (concat [ l1, l2, l3, l4, l5 ] )
  null ( l1 ++ ( thunk with rest of fold ))
  False

The rest of the fold doesn't need to be evaluated, since the beginning
part is enough for 'null' to tell that the result would have at least
one element (because l1 does). That's one way foldr can be used to
start consuming the result before the entire fold is done. It depends
completely on the accFunc: if it can return partial results, like
concat, then you can start consuming the result before a full
evaluation. Some accFunc's can't return partial results, like regular
addition. In that case, it's probably better to use foldl' (note the
apostrophe), which will force the thunks to be evaluated as they are
generated and so use less memory. foldl is the same as foldl' except
it does generate thunks, and then evaluates them all at the end of the
fold, so it uses a bunch of memory to store the thunks in the
meantime, which usually isn't useful.

Kurt

------------------------------

Message: 5
Date: Tue, 10 Mar 2009 07:59:36 -0600
From: B <bburde...@comcast.net>
Subject: [Haskell-beginners] Re: installing cabal with 6.10 - ubuntu
To: beginners@haskell.org
Message-ID: <gp5ro8$rj...@ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Christian Maeder wrote:
> B wrote:
>> I found some directions online that said that compiling from source
>> might be a good way to go.  I downloaded and compiled ghc 6.10.1,
>> installing it locally.
> 
> Next time include the extra-libraries in your sources, to avoid
> installing so many packages afterwards.
> 
>> Ok.  Now when I go to build network I get this:
>> Could not find module `Data.Generics':
>>       it is a member of package base-3.0.3.0, which is hidden
>>
>> According to google, the above message actually indicates that 'syb' is
>> needed, the bug is here:
>> http://hackage.haskell.org/trac/ghc/ticket/2980
>>
>> - installed syb-0.1.0.0.
>> - still get the same message.
>>
>> At this point I'm stuck!
> 
> Are you sure you have version 2.2.0.1
> http://hackage.haskell.org/packages/archive/network/2.2.0.1/network-2.2.0.1.tar.gz?
> This package does not need syb as dependency.
> 
> The file Network/URI.hs contains:
> #ifdef BASE4
> import Data.Data      ( Data )
> #else
> import Data.Generics  ( Data )
> #endif
> 
> and network.cabal sets the appropriate flag:
>   if flag(base4)
>       build-depends:    base>=4
>       cpp-options:      -DBASE4
> 
>> I should say that I'm using this to install all these things:
>>
>> runhaskell Setup configure --user
>> runhaskell Setup build
>> runhaskell Setup install
>>
>> So everything is installed 'locally' and not globally.
> 
> This should be fine.
> 
> HTH Christian

Thanks for the reply Christian (and the others who replied off-list).

The wrong network package was exactly the problem, for some reason I had 
  2.2.0.0 instead of 2.2.0.1.  Also, I believe I may have had some 
things left over from a previous install in ~/.ghc.

Anyway, my resolution was to delete the ghc folder from /usr/local, 
delete .cabal and .ghc from ~, and start over from scratch.

Too bad this process isn't more automated!  I did learn a lot along the 
way, but it was a bit frustrating.

My only real question now is how one might install all this globally 
rather than locally.  In ubuntu I couldn't figure out how to add ghc to 
my path so that sudo would pick it up.  That's why I ended up with the 
local install approach.

------------------------------

Message: 6
Date: Tue, 10 Mar 2009 11:26:32 -0400
From: Roland Zumkeller <roland.zumkel...@gmail.com>
Subject: Re: [Haskell-beginners] Simple data summarization
To: Andy Elvey <andy.el...@paradise.net.nz>
Cc: beginners@haskell.org
Message-ID:
        <d02dcb040903100826g59d66c56vdbffb70c2151d...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Andy,

Here is a function that parses a comma-separated list of strings:

> uncommas :: String -> [String]
> uncommas s = case break (==',') s of
>               (w,[]) -> [w]
>               (w,_:s') -> w : uncommas s'

We can then sum over the 4th column like this:

> main = putStrLn . show . sum . map (read . (!!4) . uncommas)
>        . tail . lines =<< getContents

This program is best read backwards: "getContents" gives stdin as a
string and "lines" breaks it into lines. The (!!) function yields the
nth element of a list. "read" and "show" convert between strings and
integers.

Best,

Roland

------------------------------

Message: 7
Date: Tue, 10 Mar 2009 16:49:15 +0100
From: Thomas Davie <tom.da...@gmail.com>
Subject: Re: [Haskell-beginners] Simple data summarization
To: Roland Zumkeller <roland.zumkel...@gmail.com>
Cc: beginners@haskell.org
Message-ID: <8abbcdb1-9784-4203-8dba-762ce65e9...@gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

On 10 Mar 2009, at 16:26, Roland Zumkeller wrote:

> Hi Andy,
>
> Here is a function that parses a comma-separated list of strings:
>
>> uncommas :: String -> [String]
>> uncommas s = case break (==',') s of
>>              (w,[]) -> [w]
>>              (w,_:s') -> w : uncommas s'
>
> We can then sum over the 4th column like this:
>
>> main = putStrLn . show . sum . map (read . (!!4) . uncommas)
>>       . tail . lines =<< getContents
>
> This program is best read backwards: "getContents" gives stdin as a
> string and "lines" breaks it into lines. The (!!) function yields the
> nth element of a list. "read" and "show" convert between strings and
> integers.

An alternative solution, though similar is to implement a data type  
for each record, and implement read for it:

data Gender = Male | Female

data Ethnicity = European | Maori | ..........

data Record = R {name :: String, gender :: Gender, age :: Int,  
ethnicity :: Ethnicity, books :: Int}

instance Read Gender where
   readsPrec _ s = case toLower $ read s of {'m' -> [(Male,"")]; 'f' - 
 > [(Female,"")]; _ -> []}

instance Read Ethnicity where
   ...

instance Read Record where
   readsPrec _ = buildRec . uncommas
     where
       buildRec [n,g,a,e,b] =
         fromMaybe []
           do (n',_) <- listToMaybe $ reads n
              (g',_) <- listToMaybe $ reads g
              (a',_) <- listToMaybe $ reads a
              (e',_) <- listToMaybe $ reads e
              (b',_) <- listToMaybe $ reads b
              return [(R n' g' a' e' b', "")]

Now you can get at just the names for example by mapping the getter  
over the list:

main = putStrLn . ("Names: " ++) . concat . intersperse " " . map  
(name . read) . lines =<< getContents

Bob

------------------------------

Message: 8
Date: Tue, 10 Mar 2009 12:46:43 -0400
From: Patrick LeBoutillier <patrick.leboutill...@gmail.com>
Subject: Re: [Haskell-beginners] Simple data summarization
To: Andy Elvey <andy.el...@paradise.net.nz>
Cc: beginners@haskell.org
Message-ID:
        <b217a64f0903100946m6a8515dq32c7484e75f90...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Andy,

I came up with this solution that works like you described:

import Data.List.Split

mysplit = wordsBy (==',')

toPairs :: [String] -> [(String, [String])]
toPairs (header:rows) = foldr f (initPairs header) $ splitRows rows
    where f row acc = zipWith (\f (h,r) -> (h,f:r)) row acc
          initPairs header = map (\h -> (h, [])) $ mysplit header
          splitRows rows = map (mysplit) rows

summarizeByWith :: String -> (Int -> Int -> Int) -> [(String,
[String])] -> (String, Int)
summarizeByWith var agg pairs = case (lookup var pairs) of
    Just vals -> (var, foldl agg 0 $ map (read) vals)
    otherwise -> ("", 0)

main = interact (show . summarizeByWith "Books" (+) . toPairs . lines)

However in my opinion a solution like that proposed by Roland is
preferable since it can process the input line by line instead of
storing it all in memory. It seems also simpler and propably more
efficient.

However it was interesting hacking at your algorithm because it made
me realize how you can use lists of pairs (association lists) in
haskell where you might have used hash tables in another language.

Cheers,

Patrick

On Tue, Mar 10, 2009 at 4:33 AM, Andy Elvey <andy.el...@paradise.net.nz> wrote:
> Hi all -
> In the process of learning Haskell I'm wanting to do some simple data
> summarization.
> ( Btw, I'm looking at putting any submitted code for this in the "cookbook"
> section of
> the Haskell wiki.  Imo it would be very useful there as a "next step" up
> from just reading
> in a file and printing it out.  )
> This would involve reading in a delimited file like this - ( just a
> contrived example of how many books
> some people own ) -
>
> Name,Gender,Age,Ethnicity,Books
> Mary,F,14,NZ European, 11
> Brian,M,13,NZ European, 6
> Josh,M,12,NZ European, 14
> Regan,M,14,NZ Maori, 9
> Helen,F,15,NZ Maori, 17
> Anna,F,14,NZ European, 16
> Jess,F,14,NZ Maori, 21
>
> .... and doing some operations on it. As you can see, the file has column
> headings - I prefer to be able to manipulate data with
> headings (as it is what I do a lot of at work, using another programming
> language).
>
> I've tried to break the problem down into small parts as follows. a) Read
> the file into a list of pairs.
> The first element of the pair would be the column heading.
> The second will be a list containing the data.
> For example, ("Name",  [Mary,  Brian,  Josh,  Regan, ..... ]  )
> b) Select a numeric variable to summarise ( "Books" in this example) c) Do a
> fold to summarize the variable. I think a left-fold would be the one to use
> here, but I may
> be wrong....
>
> After looking through previous postings on this list, I found some code
> which is somewhat similar to what I'm after (although the data it was
> crunching is very different).  This is what I've come up with so far -
>
> summarize [] = []
> summarize ls = let
>       byvariable = head ls
>       numeric_variable = last ls
>       sum = foldl (+) 0 $ numeric_variable
>
>   in (byvariable, sum) : sum ls
>
> main = interact (unlines . map show . summarize . lines)
> I think this might be a useful start, but I still need to read the data into
> a list of pairs as mentioned, and I'm unsure as to how to
> do that.
> Many thanks in advance for any help received.  As mentioned, I'm sure that
> examples like this could be very useful to other beginners, so I'm keen to
> make sure that any help given is made maximum use of (by putting any code on
> the Haskell wiki). - Andy
>
> _______________________________________________
> Beginners mailing list
> Beginners@haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>

-- 
=====================
Patrick LeBoutillier
Rosemère, Québec, Canada

------------------------------

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners

End of Beginners Digest, Vol 9, Issue 11
****************************************

Beginners Digest, Vol 9, Issue 11

Reply via email to