Dear Lotfi, You say: "...'approximately a' and 'approximately b' are defined by their membership functions."
The problem I have with a statement such as this is how to attach semantic meaning to the phrase "defined by their membership functions." When I have tried to seek an answer to that question, the best people can seem to manage is to give me a rule for computing membership functions and a mathematical proof that this rule generalizes the usual set membership function, in the sense of reducing to the usual function in some appropriate limit as the "amount of fuzziness" tends to zero. I don't find this very helpful. I remember back in the late 80's or early 90's asking Gene Charniak, at a UAI poster session, why he used probabilities rather than some other calculus such as fuzzy sets, belief functions, or certainty values (all of which were quite popular at the time). He responded, "When I set out to build a system, I know it is going to give me wrong answers. I am going to have to diagnose the causes of those wrong answers and fix the problem. When it gives me wrong answers, I want it to be because I gave it incorrect knowledge. I don't want it to be because I used the wrong calculus." We have hundreds of years of experience and lots of theory demonstrating that when you put the right knowledge into a probabilistic reasoning system, it gives sensible answers. We don't have that level of experience with fuzzy membership functions. My first instinct on this problem would be to model P with a higher-order distribution. For example, we might use a Beta distribution with parameters alpha and beta, where we are uncertain about the values of alpha and beta. The statement that the mean is "approximately a" would provide evidence that alpha/(alpha+beta) is near a. In other words, I would use the Bayesian network S ^^ / \ / \ alpha beta \ / \ / vv p where S denotes an assertion, made in a given context, that the mean of the distribution is approximately a. Then we would have to assess, conditional on different values of alpha and beta, in the context in which the assertion was made, the relative probabilities that such an assertion would be made given different values of alpha and beta. I grant you that this seems on the surface to be more cumbersome than just writing down a fuzzy membership function, but it has a clearly defined semantics grounded in probability theory and rational evidential reasoning. Many years ago, a then-junior engineer of my acquaintance was asked by a senior engineer why he used Method A rather than Method B. He said it was because he had to make fewer assumptions when he used Method A. The senior engineer responded, "Ah, but that isn't true. In method B, you have to SPECIFY the assumptions you are making. With Method A, you are making just as many assumptions, but they are buried implicitly within the method itself. You should always be aware of the assumptions you are making and whether they are appropriate to the problem." With fuzzy memberships, I don't know how to ascertain whether the assumptions I am making are appropriate to the problem. I will note, though, that I am happy to apply fuzzy logic when I am confident it will give good results. A number of years ago, a student compared an adaptive control system based on fuzzy logic with an adaptive control system that was optimal for a linearized model of the nonlinear system being controlled. The fuzzy system did much better than the linearized system because it degraded gracefully outside the range where the linearity assumptions gave acceptable performance. If I had to choose between the two systems my student compared, I'd use the fuzzy one, but of course it would have to be re-tuned every time the conditions changed. If we had a well-engineered approximation to a decision theoretically optimal nonlinear controller, it could be tuned to the problem in a theoretically grounded manner, which would allow it to be retuned with much less data. If we had a computationally efficient fuzzy controller and a strong theoretical result on how it was related to the optimal decision theoretic controller, we could feel confident in the semantics, retune it with less data, and thus both have and eat our fuzzy cake. But the semantics I would give it would be based on approximate decision theoretic optimality. Cheers, Kathy At 10:17 AM -0700 7/15/03, Lotfi A. Zadeh wrote: > In a recent message, I stated that the maximum entropy >principle is not applicable when the side-conditions are imprecise. > Here is a concrete example. Let X be a real-valued random variable. > What we know about the probability distribution ,P, is that its mean is >approximaately a and its variance is approximately b, where >"approximately a" and "approximately b" are fuzzy numbers defined by >their membership functions. The question is: What is the >entropy-maximizing P ? In a more general version, what we know are >approximate values of the first n moments of P. Can anyone point to a >discussion of this issue in the literature? > >-- >Lotfi A. Zadeh >Professor in the Graduate School, Computer Science Division >Department of Electrical Engineering and Computer Sciences >University of California >Berkeley, CA 94720 -1776 >Director, Berkeley Initiative in Soft Computing (BISC)