Re: statistics question

Robert J. MacG. Dawson Tue, 16 Dec 2003 07:18:30 -0800


Allan Adler wrote:
> 
> I posted this on sci.math and didn't get an answer, so I'll try here,
> where I probably should have asked in the first place.
> 
> Suppose you have two dictionaries D1,D2. Suppose that D1 is much smaller
> than D2, in the sense that it has fewer entries, but has a reputation for
> being more accurate than D2, in the sense that the probability of an entry
> of D2 being incorrect is much greater than that of an entry of D1 being
> incorrect. Suppose you want to know whether D2, whatever its faults, can
> usually be used for whatever one would want to use D1 for.
> 
> One way to find out would be to go through D1, look up all the entries and
> compare them with the corresponding entries of D2 (when D2 has them). One
> can give credence to D1's reputation by assuming that when D1,D2 disagree,
> D1 is correct. However, if D1 has too many entries, this could be impractical.
> So, instead, I would like to know how to design an experiment in which one
> samples the entries of D1 and compares the entries in the same with their
> counterparts in D2, and arrives at an estimate for the probability that
> entries in D1 are correctly treated in D2.


<flame level="mild>     
        Firstly, is this question really about dictionaries? The assumption
that the larger dictionary is less exact seems false-to-fact; the
largest dictionaries are usually the most closely researched, and the
better smaller dictionaries are condensed from them.  

        I ask this because (a) the assumptions in a question like this are
important, and (b) a surprising number of researchers bring questions
here that they have carefully changed the details of.  This makes some
of us feel like doctors being told for the Nth time about a "friend who
thinks he might have VD".  
</flame>

        So, the question is - can we assume that the probability of an "error"
in the larger "dictionary" is independent of whether the "word" is
included in the smaller "dictionary"?  If so, this becomes a fairly
trivial exercise in binomial sampling. If not, you have a hard row to
hoe.

        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: statistics question

Reply via email to