You know Kevin, April fools notwithstanding, you idea actually makes good sense in a tiny-url sort of way. There would of course be collisions, and thus, need for a global disambiguation registry, but society could do a whole lot worse than something like:
http://prot.seq.db/3fc28e91d74b39ec/a6 (translated: protein sequence hash #afc28e91274739ec, registry index #a6) as a way of unambiguously storing, referring to, and retrieving known sequences. The URL, when requested, would of course simply return the registered sequence. Keeping the scope extremely narrow like that would be the key to the registry's success: just "natural 20" sequences with no annotations. Optimal details might differ of course (CRC64 is suboptimal for ASCII sequences), but as a general concept, I do think you're on to something powerful here... Cheers, Warren > -----Original Message----- > From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of > Kevin Cowtan > Sent: Wednesday, April 01, 2009 5:02 AM > To: [email protected] > Subject: Re: [ccp4bb] New human genome policy - please read. > > Why molecular weight? That's just arbitrary. > > There is a simple way of referring to proteins which avoids any > ambiguity - by it's sequence. When referring to a protein, we should use > its sequence as an identifier. No ambiguity. > > Now, I understand that some smart people in America are now solving > proteins of more than a dozen aa in length. For these, quoting the whole > sequence could be a bit long. Fortunately this is a solved problem: all > we need to do is quote a CRC64 hash of the ascii representation of the > protein sequence. This gives a name space big enough that we can name > about 4 billion proteins before the probability of a name clash becomes > significant. > > > James Stroud wrote: > > I think actually *naming* the proteins would be too extreme. Even the > > current alpha-numeric system is overwrought. I liked it better when we > > just called proteins "p75" or "p105". For instance, how many proteins in > > the human genome are 75 kD, anyway? My guess is not enough to make the > > situation ambiguous in any catastrophic way. > > >
