Nicholas Thompson wrote:

But what then about cladistics. Cladistics is a dark art of classification that uses a variety of obscure incantations to lable relations amongst species without, so far as I understand, any reference to evolution. Yet, as I understand it, cladistics is not arbitrary.
In both cases it boils down to selecting a set of features and assigning them to a set of character states. With DNA, the job is done because the character states are A G C or T in long strings. But can also consider an encoding like C=has claws, !C does not have claws, L=has lungs, !L has no lungs, V=has vertebrae, !V not vertebrae, F=fur, !F no fur, and so on. To make a taxonomy, similarity techniques like neighbor-joining or distance methods are often used. To go to the next step and consider an evolutionary model, then things get complex fast because, for example, it is necessary to be able to say how a critter goes from having no hair to having it, or develops lungs and the relative impotance of those things. On the other hand, it is not nearly so hard if the transition you want to describe is one of an adenine changing to guanine, which is chemistry.

I think a high-level description of conceptual model features (like those Joshua suggested) as character states would work for making similarity trees without an evolutionary model behind them. The main work there is deciding on the features. And on the other extreme, one could probably come up with some very crude evolutionary model for local change of machine code based on context and knowledge of common programming idioms and/or the source language and compiler. Even if you had that, though, one thing that is assumed by most phylogenetics programs is a multiple alignment. That is, for any code fragment found anywhere in a given program, the same fragment can be found in any another aligned down to the opcode. Then there's the small matter that horizontal gene transfer happens all the time in software as 3rd party libraries get pulled in and dropped and software factoring is going on. In principle, I bet with sufficient effort one could probably recover the revision history of some large project like GCC from various binaries of different ages. But better just to go the revision system and look at the history directly. With GCC it goes back 20 years or something.

Marcus

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Reply via email to