Milton has reminded me that I neglected to send this...most people 
will probably wish to ignore this message.

On Sat, Dec 19, 1998 at 01:03:25PM -0500, Milton Mueller wrote:
> Kent's posts on this topic are becoming increasingly shrill and
> personal. Let see if we can keep this rational and focused.

I'm sorry if this comes across as a personal attack.  The unavoidable
fact is, I really don't think it's a very good paper, and even under
the best of circumstances it's likely that someone would take such a
criticism personally.  Given that in this case the criticism deals
with the possibility of a lack of basic competence in an area
important to your career, it is completely understandable that you
would take it very personally.  

But the fact remains -- it isn't a very good paper, and, independent
of whether clever statistical techniques could have helped, the paper
makes unsubstantiatable conclusions, and, more to the point, exhibits
no awareness or understanding of its flaws.  Your replies on this 
topic continue to exhibit such a lack of awareness.

> Kent wants to argue that the utilization of straightforward
> statistical sampling techniques could have provided a study of
> domain name trademark conflicts with a rigorously representative
> sample of the entire population of all such disputes.

Not quite what I was arguing.  See below.

> He bases
> this claim on distant memories of (presumably) old college
> statistics courses. He is not a social scientist and has no
> claim to expertise in this matter.

Interestingly enough, my first degree was in Psychology, from
Stanford; and good experimental design and how to get the most
information you can from your data (analysis of variance, factor
analysis, etc) were required areas of study (at least back then). 
Later I got degrees in math and computer science, which included
further exposure to probability and statistics, from a more
mathematically rigorous point of view.  

It is true, however, that currently I am more likely to be reminded
of the subject in the context of discussions of Monte Carlo
hydrodynamics simulations than I am in any social science context. 

> The premise is false. It is neither simple nor straightforward
> to get an unbiased sample of all domain name trademark cases on
> a global basis. The problem has nothing to do with the
> availability or non-availability of mathematical techniques, but
> with the availability of data and resources.
>
> Begin at the beginning. If one wants to do a statistical study,
> what population does one sample? There are basically two
> choices.
>
> 1. One can sample the population of *known* trademark domain
> name cases, cases about which there are court records and press
> accounts. If one takes this tack, one quickly discovers that
> there is no need to "sample" these cases. One can discover and
> count nearly all of them. And that in fact is what the Syracuse
> study did. As of May 1998, we turned up about 135 cases, 121 of
> which we had enough information to classify properly.
> 
> With his limited knowledge of statistics, Kent wants to imply

Sigh.  Bear in mind that there are very few who would be so bold as
to claim that their knowledge of *anything* was unlimited. 

> that we could have used this information and some fancy
> calculations to estimate the statistical characteristics of the
> *entire population* of domain name trademark cases. But this is
> simply confusion on his part. No statistical calculation, no
> matter how good, can provide more information or certainty than
> a direct count of a population. Indeed, one only needs to employ
> sampling techniques when it is not practical to count the entire
> population.

The confusion here is yours, I'm afraid.  The problem is that you
did indeed use sampling -- you just used it in a statistically
unconscious way:

The real population of interest, as you note, is the "entire
population of domain name trademark cases" -- that is the population
about which we wish to be able to make meaningful statements, and
about which we want to be able to draw conclusions.  

Your set of "known domain name trademark cases" *is a sample* of that
"entire population".  If you make an inference from your *sample* to
the entire population of interest (and you do make such inferences,
see below), then you are using a sampling technique, whether you are
aware of it or not -- a statistically undisciplined sampling
technique, but a sampling technique, nonetheless. 

As you are well aware (but do not mention in the study), this sample
has obvious systematic bias -- for example, it systematically ignores
all cases where the parties agree not to talk about the results.  It
most certainly has other biases as well. 

> By the same token, a census of all the *known* cases cannot be
> mathematically massaged into rigorous knowledge of the *unknown*
> cases.

This is a rather wild generalization.  1) Statistical inference is
rigorous, if one follows the rules -- the conclusions may have
annoying addenda like confidence levels etc, but there is no question
about the rigor of statistical inference.  2) One most certainly can
develop knowledge of unknown cases from knowledge of known cases,
both statistically and in general.  It is borders on nonsense to say
one can't -- the scientific enterprise in its entirety depends fairly
heavily on this assumption.  In particular, your study would be
completely pointless without this assumption. 

[description of methodological problems deleted]

> so one would spend
> millions of dollars, and still have a study open to charges of
> statistical bias.

Implying, of course, that the study is open to such charges... 

> We think we did it the best way feasible. Count the facts. With
> more resources and time, we would be happy to perform more
> extensive studies.

My sympathies for the methodological problems (essentially
unmentioned in the paper, however).  The issue of whether more
sophisticated techniques could get more information without a
substantial increase in cost is open -- just looking at the paper, it
is my opinion that probably a more sophisticated, but cost-effective
methodology could be found. 

But fundamentally that is not the issue... 

> But that should not let the TM lobby off the
> hook. The facts that we know--121 cases, a sizable
> number--suggest that the domain name trademark interaction is a
> lot more favorable to TM holders, and a lot less threatening to
> TM interests, that INTA and others would claim.

...the real issue is exemplified in the above paragraph.  This makes
*inferences* from an obviously biased sample to the total population
of interest.  This suggests either 1) lack of understanding that
making such inferences is not justified, or 2) lack of concern whether
they are justified or not. 

In my opinion, an approach with a higher level of intellectual and
academic rigor would be to eschew the obvious advocacy, and perhaps
say something like this:

"While the data would seem to suggest that the domain name trademark
interaction is favorable to TM holders, it remains only a suggestion. 
For the reasons we described above concerning our methodological
problems, no such conclusion can be rigorously substantiated with our
data."

---

On rereading, the paper is worse than I remembered -- though on my
first read I wasn't explicitly looking for statistical errors. 
Contrary to what you said in previous messages in this thread, there
is essentially no mention of the biases inherent in the sample, and
there is no discusion of sampling difficulty. 

We find statements like:

  Real trademark infringement using domain names is a rare and not
  very significant problem: 0.0128%, or a total of no more than 257
  cases in the generic TLDs administered by NSI.  Nearly all
  infringement activity has been quickly stopped by lawsuits. 

This extrapolation from the sample to the total population is
unjustified.  Moreover, it is meaningless, because it uses data
from two differently defined samples -- the sample of known domain
name disputes developed in the paper, and the reported 3903 disputes
that NSI received complaints about.  There are obvious techniques by 
which these samples could be related, but I suspect that the only 
data really available from NSI was "3903", and thus the samples 
cannot really be compared.

Another terrible lack is some attempt to deal with the unreported
cases.  Carl Oppedahl has made some wild estimates -- you cite him,
but the fact that you don't reference his estimates (which, as I
recall, indicate that the number of unreported cases was fairly
large, and thus not supporting your conclusions) makes one wonder
why... 

There are other examples, but this is too long already.  The
overwhelming impression is that statistical rigor was simply not a
consideration in the production of this paper.  Noting also that it
was announced with a "press release" to various mailing lists, and
that the last part is a bunch of policy recommendations (ie,
advocacy), it is hard to consider it as serious scholarship -- the
data and the methodology don't really support the conclusions, 
it is obviously biased, and it was obviously produced in support of 
that pre-existing bias.  

-- 
Kent Crispin, PAB Chair                         "Do good, and you'll be
[EMAIL PROTECTED]                               lonesome." -- Mark Twain

__________________________________________________
To receive the digest version instead, send a
blank email to [EMAIL PROTECTED]

To SUBSCRIBE forward this message to:
[EMAIL PROTECTED]

To UNSUBSCRIBE, forward this message to:
[EMAIL PROTECTED]

Problems/suggestions regarding this list? Email [EMAIL PROTECTED]
___END____________________________________________

Reply via email to