Dave Nulton wrote:

> Quite frankly Robert the details are proprietary.  I suppose I could have
> been more descriptive, but I don't see what the shape of my distribution
> have to do with what it represents

    To take the second point first, the origin of a dataset often contains
valuable information relating to the plausibility of various models. For
instance, it is a truism that "it takes money to make money". If I buy 100
shares of Wombat.Com and you buy 1000 shares, and the price goes up by $5
per share, I make $500 and you make $5000. Because of this inherently
multiplicative structure, it is *very* common for financial data to respond
well to a logarithmic transformation.

    On the other hand, "count" data may - depending on what's being counted
and how - follow a "Poisson" model. In such a model, the events being
counted hapen independently and at random in a "window" of fixed size -
calls per day to a help line, flaws per 1000 meters in recording tape,
snowflakes landing on your tongue per minute...  Such data, if the numbers
are small, may require specialized regression techniques; with more data, a
square root transformation often helps.

    If the data set is small or has any unusual features, it may be
difficult to tell which transformation is appropriate just by looking at the
data.  The "story" of the data is important.

    There are many other examples. For instance, even with a simple 2x2
table in which the frequencies of two outcomes are compared under two
situations, you need to know whether the trials are independent (in which
case a two-sample z test would typically be used) or paired across
treatments, in which case McNemar's test would be more appropriate.

    For such reasons, it is often impossible to give reliable statistical
advice based on numbers _in_vacuo_. I cannot imagine members of many other
professions attempting to do the equivalent - indeed, I would hazard a guess
that in many cases professional associations would take a dim view of giving
a professional opinion to a client/patient/whatever who insisted on
withholding relevant information.

    I would suggest that if this dataset is important enough to warrant this
level of secrecy, you find a statistician who is willing to sign a NDA, and
that you pay the going rate for the consultation. (Don't ask me, I'm neither
a professional statistician nor interested.) Trying to get advice, free or
not, from people whom you do not trust enough to give even a basic
explanation seems to me like a waste of your time and ours.

    -Robert Dawson

Reply via email to