Beginners Digest, Vol 40, Issue 18

beginners-request Thu, 13 Oct 2011 11:50:05 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  Data structure for Propositional Logic       formulas
      (Benedict Eastaugh)
   2. Re:  Data structure for Propositional Logic       formulas
      (Christian Maeder)
   3. Re:  first open source haskell project and a mystery to boot
      (Brent Yorgey)
   4. Re:  Data structure for Propositional Logic       formulas (Rustom Mody)
   5. Re:  first open source haskell project and a      mystery to boot
      (Alia)


----------------------------------------------------------------------

Message: 1
Date: Thu, 13 Oct 2011 12:24:08 +0100
From: Benedict Eastaugh <[email protected]>
Subject: Re: [Haskell-beginners] Data structure for Propositional
        Logic   formulas
To: Christian Maeder <[email protected]>
Cc: [email protected]
Message-ID:
        <caoco6ul5rumbu4zdhr0o3dqg+u_axykc49t6g4axo3zxdsu...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On 13 October 2011 10:12, Christian Maeder <[email protected]> wrote:

> Why do most people like duplicate or repeated code?

Hi Christian,

I can't speak for most people, but I like my data types to encode the
semantics of the domain. In this case, the connectives are logical
devices that allow the formation of new well-formed formulae from old
ones. I also think it's a mistake to consider them, in their guise as
operators, to be any different from negation save in their arity. In
my code they are on a par, and any extension to include new n-ary
connectives would be straightforward and natural.

(There are other unary and binary logical operators which happen not
to be listed; I simply included the most commonly used ones, rather
than add NAND, XOR, etc. As Daniel pointed out, merely including
negation and disjunction allows the formation of a truth-functionally
complete system. One can also do this with just NAND or NOR.)

The duplication is, ultimately, fairly minor, so doing what you
suggest also seems like overkill: it's not as though I'm going to be
adding another five or ten connectives any time soon. Your version,
while admittedly shorter, makes the code more complex and less
readable, for no gain in expressiveness and a very minor improvement
in brevity.

Benedict



------------------------------

Message: 2
Date: Thu, 13 Oct 2011 14:09:30 +0200
From: Christian Maeder <[email protected]>
Subject: Re: [Haskell-beginners] Data structure for Propositional
        Logic   formulas
To: Benedict Eastaugh <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Am 13.10.2011 13:24, schrieb Benedict Eastaugh:
> On 13 October 2011 10:12, Christian Maeder<[email protected]>  wrote:
>
>> Why do most people like duplicate or repeated code?
>
> Hi Christian,
>
> I can't speak for most people, but I like my data types to encode the
> semantics of the domain. In this case, the connectives are logical
> devices that allow the formation of new well-formed formulae from old
> ones. I also think it's a mistake to consider them, in their guise as
> operators, to be any different from negation save in their arity. In
> my code they are on a par, and any extension to include new n-ary
> connectives would be straightforward and natural.
>
> (There are other unary and binary logical operators which happen not
> to be listed; I simply included the most commonly used ones, rather
> than add NAND, XOR, etc. As Daniel pointed out, merely including
> negation and disjunction allows the formation of a truth-functionally
> complete system. One can also do this with just NAND or NOR.)
>
> The duplication is, ultimately, fairly minor, so doing what you
> suggest also seems like overkill: it's not as though I'm going to be
> adding another five or ten connectives any time soon. Your version,
> while admittedly shorter, makes the code more complex and less
> readable, for no gain in expressiveness and a very minor improvement
> in brevity.

I did (and do) not mean to offend you, but reading duplicated code is 
rather hard, because you'll have to take care if it is really the same 
code or if there are minor differences introduced on purpose or by 
(copy-paste) accident.

Cheers Christian

>
> Benedict
>



------------------------------

Message: 3
Date: Thu, 13 Oct 2011 12:18:40 -0400
From: Brent Yorgey <[email protected]>
Subject: Re: [Haskell-beginners] first open source haskell project and
        a mystery to boot
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=iso-8859-1

On Wed, Oct 12, 2011 at 11:59:30AM -0700, Alia wrote:

> --------------------------------------------------------------------
> -- Testing Area
> --------------------------------------------------------------------
> outlook s
> ??? | s == "sunny"??? = 1
> ??? | s == "overcast" = 2
> ??? | s == "rain"???? = 3
> 
> temp :: (Real a, Fractional n) => a -> n
> temp i = (realToFrac i) / (realToFrac 100)
> 
> humidity :: (Real a, Fractional n) => a -> n
> humidity i = (realToFrac i) / (realToFrac 100)
> 
> 
> windy x
> ??? | x == False = 0
> ??? | x == True? = 1
> 
> -- attributes
> a1 = Discrete outlook
> a2 = Continuous temp
> a3 = Continuous humidity
> a4 = Discrete windy
> 
> outlookData? = 
> ["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"]
> tempData???? = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71]
> humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80]
> windyData??? = [False, True, False, False, False, True, True, False, False, 
> False, True, True, False, True]
> outcomes???? = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]
> 
> d1 = zip outlookData outcomes
> d2 = zip tempData outcomes
> d3 = zip humidityData outcomes
> d4 = zip windyData outcomes
> 
> t1 = id3 [a1] d1
> t2 = id3 [a2] d2
> t3 = id3 [a3] d3
> t4 = id3 [a4] d4
> 
> --t5 = id3 [a1,a2,a3,a4] [d1,d2,d3,d4] 
> -- doesn't work because you can't mix strings and numbers in a list
> -- 

This also doesn't work because [d1,d2,d3,d4] isn't the right type,
even if you could mix strings and numbers in a list: d1, d2, etc. are
each lists of pairs, so [d1,d2,d3,d4] is a list of lists of pairs.

I think what you really want is to combine all the data for each
observation into a single structure.  Something like this:



data Item = Item String Double Double Bool

outlook (Item "sunny" _ _ _) = 1
outlook (Item "overcast" _ _ _) = 2
outlook (Item "rain" _ _ _) = 3

temp (Item _ i _ _) = (realToFrac i) / (realToFrac 100)

humidity (Item _ _ i _) = (realToFrac i) / (realToFrac 100)

windy (Item _ _ _ False) = 0
windy (Item _ _ _ True)  = 1

-- attributes
a1 = Discrete outlook
a2 = Continuous temp
a3 = Continuous humidity
a4 = Discrete windy

outlookData  =
["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"]
tempData     = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81,
71]
humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75,
80]
windyData    = [False, True, False, False, False, True, True, False,
False, False, True, True, False, True]
outcomes     = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]

d = zip (zipWith4 Item outlookData tempData humidityData windyData)
outcomes

t1 = id3 [a1] d
t2 = id3 [a2] d
t3 = id3 [a3] d
t4 = id3 [a4] d

t5 = id3 [a1,a2,a3,a4] d


Now t5 works just fine.

-Brent



------------------------------

Message: 4
Date: Thu, 13 Oct 2011 22:48:50 +0530
From: Rustom Mody <[email protected]>
Subject: Re: [Haskell-beginners] Data structure for Propositional
        Logic   formulas
To: [email protected]
Message-ID:
        <CAJ+TeofyqH2hFtnHKpNfvN6FCmcvyXxTJvq9ed88=p8ewm8...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On Thu, Oct 13, 2011 at 5:39 PM, Christian Maeder
<[email protected]>wrote:

> Am 13.10.2011 13:24, schrieb Benedict Eastaugh:
>
>  On 13 October 2011 10:12, Christian 
> Maeder<Christian.Maeder@dfki.**de<[email protected]>>
>>  wrote:
>>
>>  Why do most people like duplicate or repeated code?
>>>
>>
>> Hi Christian,
>>
>> I can't speak for most people, but I like my data types to encode the
>> semantics of the domain. In this case, the connectives are logical
>> devices that allow the formation of new well-formed formulae from old
>> ones. I also think it's a mistake to consider them, in their guise as
>> operators, to be any different from negation save in their arity. In
>> my code they are on a par, and any extension to include new n-ary
>> connectives would be straightforward and natural.
>>
>> (There are other unary and binary logical operators which happen not
>> to be listed; I simply included the most commonly used ones, rather
>> than add NAND, XOR, etc. As Daniel pointed out, merely including
>> negation and disjunction allows the formation of a truth-functionally
>> complete system. One can also do this with just NAND or NOR.)
>>
>> The duplication is, ultimately, fairly minor, so doing what you
>> suggest also seems like overkill: it's not as though I'm going to be
>> adding another five or ten connectives any time soon. Your version,
>> while admittedly shorter, makes the code more complex and less
>> readable, for no gain in expressiveness and a very minor improvement
>> in brevity.
>>
>
> I did (and do) not mean to offend you, but reading duplicated code is
> rather hard, because you'll have to take care if it is really the same code
> or if there are minor differences introduced on purpose or by (copy-paste)
> accident.
>
> Cheers Christian
>
>
I was going to write saying that the problem is really with ghc not allowing
'bunched' GADT style defs like so:

data Expr where
     Var  :: String -> Expr
     Neg  :: Expr -> Expr
     Conj, Disj, Cond, BiCond :: Expr -> Expr -> Expr


But now I find it works.  Is this something new in haskell 7??

Of course that does not change the fact that the OP's Expr will need 6
patterns for (total) functions of type Expr -> a.
Christian's will need only 3

The 6 will of course not reduce for the 'bunched' newly supported type
syntax [AFAIK]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20111013/edfaf49a/attachment-0001.htm>

------------------------------

Message: 5
Date: Thu, 13 Oct 2011 11:49:29 -0700 (PDT)
From: Alia <[email protected]>
Subject: Re: [Haskell-beginners] first open source haskell project and
        a       mystery to boot
To: "[email protected]" <[email protected]>
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=iso-8859-1

<snip Brent Yorgey's very helpful reply>

Brent, many thanks for taking the time to think about this. I think your?
solution totally makes sense, and compiles well.

The only thing is that when one tries to run t5 through?
the runDecisionTree function things get a little weird:

*ID3> runDecisionTree t5 (head items)
*** Exception: Prelude.head: empty list

*ID3> map accuracy [t1, t2, t3, t4, t5]
[0.35714285714285715,
0.6428571428571429,
0.7142857142857143,
0.35714285714285715,
*** Exception: Prelude.head: empty list

Other than that it works perfectly (-:

In any case, this may be a problem with the ID3 implementation itself,?
and I was really more befuddled with the issue of how to tackle the kind
of problem that you just solved than in the specifics of the algorithm.

If anyone is interested in actually demonstrating that this code is not?
buggy, here's the complete module below for reference:

Best,

AK


<ID3.hs>

-- | This module is a generic implementation
-- ? of the ID3 decision tree algorithm.
--
-- A choice node on a ``continuous'' attribute is
-- handled by splitting the population in two via the mean attribute value.

module ID3 where

import Data.Ord
import Data.List
import qualified Data.Map as Map

data DecisionTree item outcome = Choice (item -> DecisionTree item outcome)
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| Leaf outcome

data Attribute item = Discrete (item -> Integer)
? ? ? ? ? ? ? ? ? ? | Continuous (item -> Double)


runDecisionTree :: DecisionTree item outcome -> item -> outcome
runDecisionTree (Leaf outcome) _ = outcome
runDecisionTree (Choice f) item = runDecisionTree (f item) item

id3 :: (Ord outcome) => [Attribute item] -> [(item, outcome)] -> DecisionTree 
item outcome
-- When there are no unused attributes left, select the most common outcome.
id3 [] xs = Leaf $ fst $ head $ sortBy (comparing (negate.snd)) $ histogram 
(map snd xs)
-- When all the items have the same outcome, pick that outcome
id3 attrs xs | allEqual (map snd xs) = Leaf $ snd $ head xs
-- Otherwise pick the attribute with minimum entropy
? ? ? ? ? ? ?| otherwise =
? ? let (bestAttr:moreAttrs) = sortBy (comparing (informationGain xs)) attrs in
? ? case bestAttr of
? ? ? ? ?Discrete attr ->
? ? ? ? ? ? ?let attrTreeMap = Map.fromList attrTrees
? ? ? ? ? ? ? ? ?allAttrValues = nub $ map (attr . fst) xs
? ? ? ? ? ? ? ? ?subtree v = id3 moreAttrs (filter (\(x,_) -> v /= attr x) xs)
? ? ? ? ? ? ? ? ?attrTrees = [(v, subtree v) | v <- allAttrValues]
? ? ? ? ? ? ?in Choice $ \item -> case Map.lookup (attr item) attrTreeMap of
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Just subtree -> subtree
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Nothing -> error "id3: encountered a 
discrete attribute value that wasn't in the training set"
? ? ? ? ?Continuous attr ->
? ? ? ? ? ? ?let meanv = mean (map (attr.fst) xs)
? ? ? ? ? ? ? ? ?ltTree = id3 moreAttrs (filter (\(x,_) -> attr x < ?meanv) xs)
? ? ? ? ? ? ? ? ?gtTree = id3 moreAttrs (filter (\(x,_) -> attr x >= meanv) xs)
? ? ? ? ? ? ?in Choice $ \item -> if attr item < meanv
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then ltTree
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?else gtTree

informationGain :: Ord outcome => [(item, outcome)] -> Attribute item -> Double
informationGain xs (Discrete attr) =
? ? currentEntropy - sum (map term allAttributeValues)
? ? where
? ? currentEntropy = entropy (map snd xs)
? ? term a = probabilityOf (==a) * entropy (outcomesFor (==a))
? ? probabilityOf f = fromIntegral (length (outcomesFor f)) / fromIntegral 
(length xs)
? ? outcomesFor f = map snd $ filter (f . attr . fst) xs
? ? allAttributeValues = nub $ map (attr . fst) xs
informationGain xs (Continuous attr) =
? ? currentEntropy - term (< meanv) - term (>= meanv)
? ? where
? ? currentEntropy = entropy (map snd xs)
? ? term f = probabilityOf f * entropy (outcomesFor f)
? ? probabilityOf f = fromIntegral (length (outcomesFor f)) / fromIntegral 
(length xs)
? ? outcomesFor f = map snd $ filter (f . attr . fst) xs
? ? meanv = mean (map (attr.fst) xs)

entropy :: Ord a => [a] -> Double
entropy xs = sum $ map (\(_,n) -> term (fromIntegral n)) $ histogram xs
? ? where term 0 = 0
? ? ? ? ? term n = - (n / num) * log (n / num) / log 2
? ? ? ? ? num = fromIntegral (length xs)

histogram :: Ord a => [a] -> [(a, Int)]
histogram = buildHistogram Map.empty
? ? where buildHistogram map [] = Map.assocs map
? ? ? ? ? buildHistogram map (x:xs) = buildHistogram (Map.insertWith (+) x 1 
map) xs

-- Simple "utility" functions
allEqual :: Eq a => [a] -> Bool
allEqual = and . mapAdjacent (==)

mapAdjacent :: (a -> a -> b) -> [a] -> [b]
mapAdjacent f xs = zipWith f xs (tail xs)

mean :: (Real a, Fractional n) => [a] -> n
mean xs = realToFrac (sum xs) / realToFrac (length xs)


--------------------------------------------------------------------
-- Testing Area
--------------------------------------------------------------------
data Item = Item String Double Double Bool deriving Show

outlook (Item "sunny" _ _ _) = 1
outlook (Item "overcast" _ _ _) = 2
outlook (Item "rain" _ _ _) = 3

temp (Item _ i _ _) = (realToFrac i) / (realToFrac 100)

humidity (Item _ _ i _) = (realToFrac i) / (realToFrac 100)

windy (Item _ _ _ False) = 0
windy (Item _ _ _ True) ?= 1

-- attributes
a1 = Discrete outlook
a2 = Continuous temp
a3 = Continuous humidity
a4 = Discrete windy

outlookData ?= 
["sunny","sunny","overcast","rain","rain","rain","overcast","sunny","sunny","rain","sunny","overcast","overcast","rain"]
tempData ? ? = [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 75, 72, 81, 71]
humidityData = [85, 90, 78, 96, 80, 70, 65, 95, 70, 80, 70, 90, 75, 80]
windyData ? ?= [False, True, False, False, False, True, True, False, False, 
False, True, True, False, True]
outcomes ? ? = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]

accuracy t = naccurate / ntotal
? ? where
? ? naccurate = fromIntegral $ length $ filter (== True) check
? ? ntotal = fromIntegral $ length check
? ? check = map (\x -> fst x == snd x) comparison
? ? comparison = zip outcomes $ map (runDecisionTree t) items

items = zipWith4 Item outlookData tempData humidityData windyData
d = zip items outcomes

t1 = id3 [a1] d
t2 = id3 [a2] d
t3 = id3 [a3] d
t4 = id3 [a4] d

t5 = id3 [a1,a2,a3,a4] d

results = map accuracy [t1, t2, t3, t4, t5]

</ID3.hs>




------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 40, Issue 18
*****************************************

Beginners Digest, Vol 40, Issue 18

Reply via email to