Hi Gus, > > > Gus, > > > > It would be possible to subsample any data set in order to produce any > > effect desired. Until I know how you find the subset, I can not know if that > > action is removing or obscuring the causal pattern. > > Neither do I. But I need to know whether the causal pattern is still there. I > guess > you allow the possibility that the causal relation is removed. I just can't see > how. > Let's say, I take the causal relation "Smoking causes cancer". The existence of > this causal relation should not depend on how I pick my subjects, should it?
The existence of the causal pattern in the victims of cancer is there but the numbers will not reflect that cancer if they are chosen in a way that obscures the causal mechanism. If we subsampled a population by retaining only those values that deviate widely from the expected values (according to regression) then we magnify the errors at the expense of seeing the common variance. This would be a way of obscuring the causal pattern. I am not saying this is what you are doing. But samples can be collected that disinherit the causal evidence. Sampling causes normally is a way of disinheriting important causal data, since the extremes of the causes will very rarely be combined. This is an example of how our sampling can obscure the presence of causal patterns. Sampling is a very dynamic thing, it is not the holy grail that so many correlational and SEM folks make it out to be with their snap shot samples. What nature throws up for the convenience of a scientists snap shot is in no way more valid than a systematically collected sample, by which the scientist counterbalances and seeks information in a more efficient and conscientious manner. > Of course, if I end up with only nonsmokers in the subsample, I won't detect > the relationship, but that does not invalidate its existence. What am I missing? If you remove those with cancer then you will not be able to detect cancer and the data will have disinherited the cancer mechanism. The folks with cancer still have it but the sample is inadequate to reflect the cancer. When we trim the ends off normal distributions, the disjunctive cases still exist but we remove them so that the conjunctive pairings can exist. Sampling normally removes the conjunctive cases in the extremes. By trimming we just continue by counterbalancing and removing the disjunctive as well, leaving data that has ot been filtered (the new extremes). There is absolutely nothing wrong with picking your subjects so long as you are not rigging the data to imply something that is not true. We might, for example, pick subjects with lung cancer and those with none. Then we might compare number of years people from each group smoked. There is nothing wrong with doing this. Similarly, when a chemist experiments on some chemical, there is no reason he should be forbidden from isolating that chemical from the other chemicals that might correlate or occur with it in the wild. Purification is a common practice in most sciences. We just have to be very explict about what we do. So far, I still do not know what you are doing. Bill . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
