I can imagine Facebook friends sharing their Ancestry.com data.   Facebook 
compiles all that that and sells services to insurance companies so that they 
can anticipate risk.
There’s no bound on the stupidity of Facebook users.

From: Friam <[email protected]> on behalf of Roger Critchlow 
<[email protected]>
Reply-To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
Date: Thursday, May 2, 2019 at 1:02 PM
To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
Subject: Re: [FRIAM] More on levels of sequence organization

I did have some energy and it was a pretty entertaining read.

So 7/8ths of the authors for this paper are at Facebook's AI group, though one 
gives an email address @gmail.com<http://gmail.com>.  The group that won the 
CASP13 (Critical Assessment of Structure Prediction) competition in December 
was from Google/DeepMind, as memorialized by 
https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/.
  The DeepMind model, called AlphaFold, was supervised learning of 3D structure 
coordinates from amino acid sequences.  DeepMind has yet to publish a paper 
detailing the methods used by AlphaFold

This model is unsupervised learning to predict a missing amino acid given the 
rest of the sequence, so you plug in a new protein sequence of N amino acids 
and it spits out an amino acid probability distribution for each of the N 
positions, an N*25 dimensional vector that represents everything it learned 
from the training set.  They report a series of tests that appear to support 
their claims, there doesn't appear to be any major cherry picking or data 
censoring involved in the tests.  I'm not sure how they're encoding 25 amino 
acids, since wikipedia is pretty sure that 22 is all there are in proteins.

But they don't actually extract the levels of organization from the model.  
They take the levels of organization as known facts and construct observations 
of the model that make predictions consistent with the levels.  So if there are 
levels of organization as yet unidentified, they are at least as obscure in the 
model as they are in reality.   And to claim that the levels of organization 
emerge from the model sort of ignores how much work went into constructing the 
observations.

On the other hand, one might be surprised that all these levels are implicit in 
the amino acid sequences, but life knew that already, that's why it only 
remembers the sequences.

The most complex model they fit learned 700 million parameters, and it wasn't 
overfit, so they're presumably gearing up to fit a series of bigger models to 
that exponentially growing database of known protein sequences.   AlphaFold, 
meanwhile, is stuck working with the more slowly growing database of known 
protein 3D structures.

-- rec --

On Tue, Apr 30, 2019 at 9:40 PM Marcus Daniels 
<[email protected]<mailto:[email protected]>> wrote:
Cool!

“For synthetic biology, iteratively querying a model of the mutational fitness 
landscape could help efficiently guide the introduction of mutations to enhance 
protein function (Romero & Arnold, 2009), inform protein design using a 
combination of activating mutants (Hu et al., 2018), and make rational 
substitutions to optimize protein properties such as substrate specificity 
(Packer et al., 2017), stability (Tan et al., 2014), and binding (Ricatti et 
al., 2019).”

Get a few billion people to get full genome sequencing, and let the TPUs 
discover how we work!    Everyone gets a custom cocktail to improve stamina, 
fight off cancer, etc. etc.

Marcus

From: Friam <[email protected]<mailto:[email protected]>> on 
behalf of Roger Critchlow <[email protected]<mailto:[email protected]>>
Reply-To: The Friday Morning Applied Complexity Coffee Group 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, April 30, 2019 at 8:49 PM
To: The Friday Morning Applied Complexity Coffee Group 
<[email protected]<mailto:[email protected]>>
Subject: [FRIAM] More on levels of sequence organization

This just turned up on hacker news:

   https://www.biorxiv.org/content/10.1101/622803v1

[...] To this end we use unsupervised learning to train a deep contextual 
language model on 86 billion amino acids across 250 million sequences spanning 
evolutionary diversity. The resulting model maps raw sequences to 
representations of biological properties without labels or prior domain 
knowledge. The learned representation space organizes sequences at multiple 
levels of biological granularity from the biochemical to proteomic levels. [...]

Don't know if I have the energy to plow through the text.

-- rec --
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
archives back to 2003: http://friam.471366.n2.nabble.com/
FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
archives back to 2003: http://friam.471366.n2.nabble.com/
FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove

Reply via email to