Matthew Pocock schrieb: > > Hi. > > On a related note to the languages, should we rite a SymbolList-centric > regexp package? It should not be too hard to do. Do people do many > regexp searches over DNA strings? >
I think so. Given a cDNA/RNA-sequence, they regularly do this. Like searching for TATA-boxes and open reading frames (Biologists' work). Although the regexps might get very large in certain circumstances. But I also think it is not a classical 10-line-code-piece to manage it (quick versions of this). Regular expression pattern matching is the same as scanning (JFLEX), as I understood it. Since it is not enough to create a deterministic finite automaton from a regexp-object, but there are also algorithms that can match with a complexity of less than proportional-to-sequence- length, depending on the pattern of the regexp. And sure, if you work on this, you want to create something real fast in the end. > M > > ps Armin, are cytogenetic loci identical to sequences of DNA, or are > they labels for regions of these sequences? Hope I understand the question right. If, then in a sense, they are real DNA-pieces, thus sequences. First the basics again (to be complete): We have '1' that is the whole chromosome one. Mikroscopically seen. Means sequence and its complement. '1p' is the short arm of '1', follows the centromere with '1cen' and then the '1q'. So, '1' is the sequence of '1p', followed by '1cen' followed by '1q' (or reverse). Not more and not less is in a '1'. This goes on further down. Follow the regions ('1pter','1p3','1p2',....) These regions are defined by alternating dark and light areas. (so for example, if 1p1 is dark, 1p2 is light). This goes on downward (1p1->1p11,1p12,...), since if you look only long enough through your microscope, you can see the dark and light areas splitting up again in finer alternating dark and light areas. But it is a true problem if at some point in time someone would really try to map the cytogenetic loci to true DNA-sequences. The nature of the dark and light areas is not really understood (as I know). They may vary a little in size (or spread range upon the real sequence) over time and/or individual (fuzzy borders then), and surely every organism has other sequences in there. Regards, Armin _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l