Re: looking for a distance measure

Christian Hennig Thu, 24 Sep 2009 10:08:38 -0700

Hi there,

I would probably approach it like this.

Nomenclature: "ACB1" means that AC were tried and didn't work and B thenworked. So "CABD" means "all tried in the given order, none worked".

Zero distance: two identical situations (same drugs were tried, the sameone worked)(d(AB1,AB1), but also d(CBDA,CDAB); order of what was tried out andfailed should generally be ignored... is this appropriate?)

Small distance: same one worked. (Among these, distance should be smallerif there are also intersections in terms of "drugs tried and did notwork")(d(ABC1,AC1) should be smaller than d(ABC1,C1) or d(ABC1,DC1), butprobably not much smaller.)

Intermediate: not the same drug worked (or in one case one worked and inthe other one none of them), but there is intersection among the drugsthat were tried and did not work.Distance should be increased if there is a drug that in one case workedbut in one case was tried and didn't (creating an incompatibility ofsequences in the same patient).

(d(ABD1,ABC1)<d(AD1,ABC1)<d(AD1,ADC1))

Large distance: different drugs worked (or none at all in one of the twocases), and there is no intersection, but it is still possible to put allof them together in a compatible way in a single patient:(d(BA1,CD1) - actually this may be assessed to be smaller thanincompatible d(AD1,ADC1) from above with intersection.)


Maximum distance: no intersection and incompatible.
(d(ABCD,B1), d(AB1,BD1))

Of course, if accepting this, a precise scaling is still needed (though ifthen methods are used that are invariant against monotone transformations,this probably doesn't matter too much.)I think that this summarises pretty much all the decisions that have to bemade, and if possible subject matter knowledge and expert assessmentshould be used to make them.


Just my two cents,
Christian





On Wed, 23 Sep 2009, Shannon, William wrote:

A follow up based on some questions I got from members of the lsit.

The data will be a list of distinct 0's and 1's and missing values.  Suppose 
patient 1 received drug A with no effect and then drug B which was effective -- 
their data would be (0 1 Missing Missing).  Patient 2 receives drugs C and D 
with no effect but A works, and B is never given -- their data would be (1 
Missing 0 0).  Etc.

Assume the columns or entries of the vectors corresponding to drug A B C D 
where the entry is 0 if not effective, 1 if effective, and missing if not 
given.  Assume also the order of drug given is random.

It may be order and number of ineffective drugs given should be ignored and 
distance based on responding to the same drug or different drug.

Thank you

Bill Shannon, PhD
Associate Prof. of Biostatistics in Medicine
Washington University School of Medicine
Director, Biostatistical Consulting Center
314-454-8356
________________________________________
From: Shannon, William
Sent: Wednesday, September 23, 2009 11:44 AM
To: class l list ([email protected])
Cc: Shannon, William; Farrokh Alemi
Subject: looking for a distance measure

Hi Everyone

I may be working with a data set that has the following structure and will need 
to develop a distance measure.  I have not had time to think carefully about it 
but am hoping someone might have already worked with data like this.

Patients present to the doctor with a disease and it is unknown which of four 
drugs they will respond to (the goal of this project is to improve the ability 
to predict and be able to give the correct drug first).  MD?s treat these 
patients  empirically ? give them drug A and see if they respond, if not give 
them drug B and see if they respond, etc.

We assume a patient either responds or does not, and that there is no carry 
over or order of drug effect (i.e., if you respond to drug B it is irrelevant 
if you had already had drug A).  I also assume there is no set order on which 
drugs are given first.

The data for each patient will be a vector of 0?s for non response and a 1 for 
response, with the number of 0?s dependent on how many drugs were given 
empirically before a response occurred.

How do we calculate a pair wise distance matrix between pairs of patients with 
this data?


Thank you.

Bill Shannon, PhD
Associate Professor of Biostatistics in Medicine
Washington University School of Medicine
St. Louis, MO

314-454-8356
[email protected]<mailto:[email protected]>

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[email protected], www.homepages.ucl.ac.uk/~ucakche

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

Re: looking for a distance measure

Reply via email to