Re: [ccp4bb] Yes, there are now 214 *million* structures in the AlphaFold Protein Structure Database

2022-07-29 Thread Frank von Delft

Gerard, Sameer - that is highly impressive indeed


About that Editorial (nice read!), it frames these developments as a 
"crisis" that heralds "the end of structural biology" - or at the very 
least, you're responding to people having said that.  I remain puzzled 
that such stuff receive any editorial oxygen - to me it's like saying 
that that higher flux and automation at synchrotrons threaten 
crystallography.  Or something.  AlphaFold is just another tool - a 
sensationally powerful one, of course, but it doesn't Do Science.


Unless there have been noises from Funders - in which case we do need a 
calls to arms, to scream some sense into them.


Btw, for whoever missed it, AlphaFold models are as accurate as 3.5A 
structures: 
https://twitter.com/LindorffLarsen/status/1527410977213403147. So 
they're amazing, but Real Crystallographers know the experimental 
distance between 3.5 and the 2.2 needed for structure-supported ligand 
design, to name just one.


Frank



On 28/07/2022 16:19, Gerard Kleywegt wrote:

Hi all,

I thought Sameer was burying the lead a tad in his message... :-) So, 
for those of you who -like me- are not on social media:


==> As of today, the AlphaFold Protein Structure Database contains 214 
million models predicted with AlphaFold, covering almost all of 
UniProt. <==


So, if your favourite protein was not available in the database before 
today, it's worth checking in again at 
https://www.alphafold.ebi.ac.uk/ now.


See also:

- EMBL-EBI press release: 
https://www.ebi.ac.uk/about/news/technology-and-innovation/alphafold-200-million/


- Nature news: https://www.nature.com/articles/d41586-022-02083-2

- IUCr J guest editorial about the potential impact of all this on 
structural biologists (shameless plug): 
https://journals.iucr.org/m/issues/2022/04/00/me6185/index.html



Best wishes,

--Gerard





On Thu, 28 Jul 2022, Sameer Velankar wrote:


Dear All,

You may have seen our announcement today about expanding the 
AlphaFold Protein Structure Database to 214M predicted models. To 
enable this expansion, we’ve updated the Predicted Aligned Error 
(PAE) JSON format to make it compact (about 4x smaller):


The PAE JSON numbers are now rounded to the closest integer, giving 
~75% compressed size reduction. The integer resolution is sufficient 
for analytical purposes.
The indices are not stored anymore since we store the full 2D PAE 
matrix rather than a sparse one, giving ~4% compressed size reduction.
The “distances” field has been renamed to “predicted_aligned_error” 
and is now stored as a 2D array of shape (num_res, num_res) rather 
than a 1D array. We renamed the field on purpose so that existing 
code breaks rather than potentially silently returning wrong values.


For a protein of length num_res, the PAE JSON file has now the 
following format:


[{
 "predicted_aligned_error": [[0, 1, 4, 7, 9, ...], ...],  # Shape: 
(num_res, num_res).

 "max_predicted_aligned_error": 31.75  # Scalar.
}]

The fields in the JSON file are:
predicted_aligned_error: The PAE value of the residue pair, rounded 
to the closest integer. For PAE value on position (i, j), i is the 
residue on which the structure is aligned for the predicted error, j 
is the residue on which the error is predicted.
max_predicted_aligned_error: A number that denotes the largest 
possible unrounded value of PAE that could occur in the PAE array. 
The smallest possible value of PAE is 0.


The updated PAE format is only available from the AlphaFold Protein 
Structure Database. The PAE format from the AlphaFold Colab notebook 
is not updated.


If you require support with this change, please email 
alphaf...@deepmind.com  and they may 
be able to assist.


Best Wishes,

Sameer Velankar


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/





Best wishes,

--Gerard

**
   Gerard J. Kleywegt

  http://xray.bmc.uu.se/gerard   mailto:ger...@xray.bmc.uu.se
**
   The opinions in this message are fictional.  Any similarity
   to actual opinions, living or dead, is purely coincidental.
**
   Little known gastromathematical curiosity: let "z" be the
   radius and "a" the thickness of a pizza. Then the volume
    of that pizza is equal to pi*z*z*a !
**



To unsubscribe from the CCP4BB list, 

[ccp4bb] Yes, there are now 214 *million* structures in the AlphaFold Protein Structure Database

2022-07-28 Thread Gerard Kleywegt

Hi all,

I thought Sameer was burying the lead a tad in his message... :-) So, for 
those of you who -like me- are not on social media:


==> As of today, the AlphaFold Protein Structure Database contains 214 million 
models predicted with AlphaFold, covering almost all of UniProt. <==


So, if your favourite protein was not available in the database before today, 
it's worth checking in again at https://www.alphafold.ebi.ac.uk/ now.


See also:

- EMBL-EBI press release: 
https://www.ebi.ac.uk/about/news/technology-and-innovation/alphafold-200-million/

- Nature news: https://www.nature.com/articles/d41586-022-02083-2

- IUCr J guest editorial about the potential impact of all this on structural 
biologists (shameless plug): https://journals.iucr.org/m/issues/2022/04/00/me6185/index.html



Best wishes,

--Gerard





On Thu, 28 Jul 2022, Sameer Velankar wrote:


Dear All,

You may have seen our announcement today about expanding the AlphaFold Protein 
Structure Database to 214M predicted models. To enable this expansion, we’ve 
updated the Predicted Aligned Error (PAE) JSON format to make it compact (about 
4x smaller):

The PAE JSON numbers are now rounded to the closest integer, giving ~75% 
compressed size reduction. The integer resolution is sufficient for analytical 
purposes.
The indices are not stored anymore since we store the full 2D PAE matrix rather 
than a sparse one, giving ~4% compressed size reduction.
The “distances” field has been renamed to “predicted_aligned_error” and is now 
stored as a 2D array of shape (num_res, num_res) rather than a 1D array. We 
renamed the field on purpose so that existing code breaks rather than 
potentially silently returning wrong values.

For a protein of length num_res, the PAE JSON file has now the following format:

[{
 "predicted_aligned_error": [[0, 1, 4, 7, 9, ...], ...],  # Shape: (num_res, 
num_res).
 "max_predicted_aligned_error": 31.75  # Scalar.
}]

The fields in the JSON file are:
predicted_aligned_error: The PAE value of the residue pair, rounded to the 
closest integer. For PAE value on position (i, j), i is the residue on which 
the structure is aligned for the predicted error, j is the residue on which the 
error is predicted.
max_predicted_aligned_error: A number that denotes the largest possible 
unrounded value of PAE that could occur in the PAE array. The smallest possible 
value of PAE is 0.

The updated PAE format is only available from the AlphaFold Protein Structure 
Database. The PAE format from the AlphaFold Colab notebook is not updated.

If you require support with this change, please email alphaf...@deepmind.com 
 and they may be able to assist.

Best Wishes,

Sameer Velankar


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




Best wishes,

--Gerard

**
   Gerard J. Kleywegt

  http://xray.bmc.uu.se/gerard   mailto:ger...@xray.bmc.uu.se
**
   The opinions in this message are fictional.  Any similarity
   to actual opinions, living or dead, is purely coincidental.
**
   Little known gastromathematical curiosity: let "z" be the
   radius and "a" the thickness of a pizza. Then the volume
of that pizza is equal to pi*z*z*a !
**



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/