I would say a statistical inference is less generative than a theory.  A theory 
in some sense asserts how things really work.   Data mining may stumble across 
the crucial aspects of a mechanism (whether it is physical, sociological, etc.) 
but they may also just being seeing some derived quantity of other hidden 
variables.     Perhaps there _is_ a reason why tying shoes one way or another 
is related to some mode of cognitive processing that is more efficient?  Or 
maybe it arises because some parts of the country, folks tend to have that 
habit, and those parts of the country happen to cleaner water or less air 
pollution or better schools or have social constraints in their communities 
that lead individuals to navigate authoritarianism better than others?

I think that data mining could be elaborated (and automated) to begin to create 
theories.  For example, if a regression had an especially simple form that was 
also predictive, describe the variables with some ontology that says why they 
ought to relate in a deterministic fashion.   Instead of just “the weather will 
be rainy tomorrow”, report “the weather will be rainy tomorrow because there is 
a low pressure system coming in the from the west”, and then reference 
mathematical models for how weather systems behave, etc.

From: Friam [mailto:[email protected]] On Behalf Of Eric Charles
Sent: Friday, September 09, 2016 9:31 AM
To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
Subject: Re: [FRIAM] speaking of analytics

Marcus,
That's an interesting distinction. Is it the case that by "theory" Nick was 
referring to something verbal and explicitly metaphorical, or would the results 
of data mining, which one sought to validate on a different sample, count as a 
"theory".

So, for example, if my data mining of Marine data found that tying shoes 
left-to-right predicted success at Officer Candidate School, and I then went to 
test for that "prediction" in a later sample of incoming officer candidates, to 
what extent is my prediction based on "a theory".

Of course, "data mining will be a  useful way to uncover patterns" is itself a 
theory, applicable in some domains but not others (i.e., not all domains of 
inquiry will contain the sought after patterns in a long-term stable form).

Eric



-----------
Eric P. Charles, Ph.D.
Supervisory Survey Statistician
U.S. Marine Corps

On Fri, Sep 9, 2016 at 10:51 AM, Marcus Daniels 
<[email protected]<mailto:[email protected]>> wrote:
“I know that theories are really useful for making predictions, but can one 
actually make a prediction without one?”

Yes, that’s what data mining is:  Take a large corpus of data, find some 
statistically rare relationships, and then test for their predictive value on 
another large corpus of data.     In this way one can predict things without 
really having any kind of theory or even domain knowledge.

Marcus

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Reply via email to