Re: [UAI] Maximum Entropy Principle

Kathryn Blackmond Laskey Wed, 16 Jul 2003 18:12:03 -0700

Dear Lotfi,

You say:  "...'approximately a' and 'approximately b' are defined by 
their membership functions."

The problem I have with a statement such as this is how to attach 
semantic meaning to the phrase "defined by their membership 
functions."  When I have tried to seek an answer to that question, 
the best people can seem to manage is to give me a rule for computing 
membership functions and a mathematical proof that this rule 
generalizes the usual set membership function, in the sense of 
reducing to the usual function in some appropriate limit as the 
"amount of fuzziness" tends to zero.  I don't find this very helpful.

I remember back in the late 80's or early 90's asking Gene Charniak, 
at a UAI poster session, why he used probabilities rather than some 
other calculus such as fuzzy sets, belief functions, or certainty 
values (all of which were quite popular at the time).  He responded, 
"When I set out to build a system, I know it is going to give me 
wrong answers.  I am going to have to diagnose the causes of those 
wrong answers and fix the problem.  When it gives me wrong answers, I 
want it to be because I gave it incorrect knowledge.  I don't want it 
to be because I used the wrong calculus."  We have hundreds of years 
of experience and lots of theory demonstrating that when you put the 
right knowledge into a probabilistic reasoning system, it gives 
sensible answers.  We don't have that level of experience with fuzzy 
membership functions.

My first instinct on this problem would be to model P with a 
higher-order distribution.  For example, we might use a Beta 
distribution with parameters alpha and beta, where we are uncertain 
about the values of alpha and beta.  The statement that the mean is 
"approximately a" would provide evidence that
      alpha/(alpha+beta)
is near a.  In other words, I would use the Bayesian network

             S
            ^^
           /  \
          /    \
        alpha beta
          \    /
           \  /
            vv
             p

where S denotes an assertion, made in a given context, that the mean 
of the distribution is approximately a.  Then we would have to 
assess, conditional on different values of alpha and beta, in the 
context in which the assertion was made, the relative probabilities 
that such an assertion would be made given different values of alpha 
and beta.  I grant you that this seems on the surface to be more 
cumbersome than just writing down a fuzzy membership function, but it 
has a clearly defined semantics grounded in probability theory and 
rational evidential reasoning.

Many years ago, a then-junior engineer of my acquaintance was asked 
by a senior engineer why he used Method A rather than Method B.  He 
said it was because he had to make fewer assumptions when he used 
Method A.  The senior engineer responded, "Ah, but that isn't true. 
In method B, you have to SPECIFY the assumptions you are making. 
With Method A, you are making just as many assumptions, but they are 
buried implicitly within the method itself.  You should always be 
aware of the assumptions you are making and whether they are 
appropriate to the problem."  With fuzzy memberships, I don't know 
how to ascertain whether the assumptions I am making are appropriate 
to the problem.

I will note, though, that I am happy to apply fuzzy logic when I am 
confident it will give good results.  A number of years ago, a 
student compared an adaptive control system based on fuzzy logic with 
an adaptive control system that was optimal for a linearized model of 
the nonlinear system being controlled.  The fuzzy system did much 
better than the linearized system because it degraded gracefully 
outside the range where the linearity assumptions gave acceptable 
performance.  If I had to choose between the two systems my student 
compared, I'd use the fuzzy one, but of course it would have to be 
re-tuned every time the conditions changed.  If we had a 
well-engineered approximation to a decision theoretically optimal 
nonlinear controller, it could be tuned to the problem in a 
theoretically grounded manner, which would allow it to be retuned 
with much less data.  If we had a computationally efficient fuzzy 
controller and a strong theoretical result on how it was related to 
the optimal decision theoretic controller, we could feel confident in 
the semantics, retune it with less data, and thus both have and eat 
our fuzzy cake.  But the semantics I would give it would be based on 
approximate decision theoretic optimality.

Cheers,

Kathy

At 10:17 AM -0700 7/15/03, Lotfi A. Zadeh wrote:
>               In a recent message, I stated that the maximum entropy
>principle is not applicable when the side-conditions are imprecise.
>  Here is a concrete example.  Let X be a real-valued random variable.
>  What we know about the probability distribution ,P, is that its mean is
>approximaately a and its variance is approximately b, where
>"approximately a" and "approximately b" are fuzzy  numbers defined by
>their membership functions.  The question is:  What is the
>entropy-maximizing  P ?  In a more general version, what we know are
>approximate values of the first n moments of P.   Can anyone point to a
>discussion of this issue in the literature?
>
>--
>Lotfi A. Zadeh
>Professor in the Graduate School, Computer Science Division
>Department of Electrical Engineering and Computer Sciences
>University of California
>Berkeley, CA 94720 -1776
>Director, Berkeley Initiative in Soft Computing (BISC)

Re: [UAI] Maximum Entropy Principle

Reply via email to