Re: [ccp4bb] Experimental phasing Selenomethionine data collection etc. tips

2024-05-17 Thread James Holton

A few follow-up questions I got out-of-band:


how did you get to the 1:1 relationship between Bijvoet ratio and dose?

I got this from fitting a straight line to Table 1 of Banu's 2004 paper: 
10.1107/S0907444904007917
Is this a rough estimate based on a singular result?  Of course it is!  
This is how we roll in radiation damage research.


Comment:  with the more modern pixel array detectors (e.g. Eiger), you 
can slice your dose even more finely than 0.1s, and not worry about 
the readout time.

yes.

With a bit of a caveat on how many photons/pixel you need for stable 
background subtraction. XDS starts having issues around 1 photon/pixel 
or less, and DIALS claims to be able to get to 0.01 photons/pixel, but I 
have not personally pushed it that far.  Not yet.


I have a plan to try and push zero-dose extrapolation to the 
one-photon-per-image level, but that is on another thread.

is it better to collect 360 or 720º at half the dose
Nothing wrong with going longer than 360, especially if you want to do 
zero-dose extrapolation, because it is only by repeating the same phi 
range (and everything else) exactly that you get a genuinely "same" 
increment in dose.


However, once you go past 360 the "multiplicity" you gain starts turning 
into what you might call a "redundancy". What I mean by that is that in 
the first 360 each spot and its symmetry mates generally show up on 
different pixels.  Each pixel has about 1% to 3% calibration error 
associated with it (depending on the detector). So, for the 2nd 360 you 
will re-measure all the same spots with the same pixels again, repeating 
a systematic error.  You will also have the same sample self-absorption, 
etc. But, the pixel calibration error starts to really matter for 
anomalous at high "redundancy". To put it another way, if a particular 
pixel has 1% error, then counting more than 10,000 photons with it is a 
waste, because the systematic error of 1% will start to dominate the 
total error at higher photon counts. So, for anomalous especially, I 
recommend moving the detector between 360s. Sliding it horizontally is 
best. Or you can use 2theta.  But, a small change in detector distance 
can usually do it and is almost always an available option.


The only problem with all this "dose slicing" is the images get very 
very weak.


And that brings us back to the "weak image limit".  What if instead of 
images we just collected a list of x-y coordinates of photon hits vs 
time? Anyone have a suggestion for the name to give to the program that 
can process such data?


-James Holton
MAD Scientist


On 5/15/2024 3:28 PM, James Holton wrote:


Thank you to all who provided helpful suggestions so far.

A few things I'd recommend for this particular beamline (which I have 
been running for 20+ years)


Do NOT collect one wavelength at a time. This was a good strategy on 
old beamlines with noisy detectors and slow, drifty monochromators. 
This is not the case at any of the ALS beamlines today. With modern 
zero-read-noise detectors there is no penalty to spreading your 
photons over a lot more images, and round-robin changes between at 
least two wavelengths will double your phasing power for the same 
dose. With 8.3.1's monochromator, wavelength changes take about 1 
second and are reproducible to well within the intrinsic width of the 
Se peak.  So you don't need to worry about missing or drifting off the 
peak or inflection. The only thing you need to worry about is 
over-cooking your crystal before you get all the data you need.


No matter what beamline you use the number of photons your crystal 
will give off before it dies is a fixed number. All you get to do is 
decide how to spread them over the images. Doing two wavelengths 
within this photon budget doesn't hurt. You can always scale and merge 
them together. But keeping them separate gives you both kinds of 
anomalous differences, which are 90 deg apart. So, when one zigs the 
other zags. It is like having twice as many sites without the extra 
damage you would get from them. Also, by taking shorter/weaker 
exposures you maximize your chances of winning over radiation damage 
"in-post" by cutting off images that degrade your signal.


And before anybody says it: NO! Collecting fainter images does NOT 
degrade your resolution. I don't know where this idea comes from, but 
it never seems to die. It was true with film and image plates, but 
with pixel arrays and modern CCDs there is no penalty to weak images.  
Don't believe me? Read the manual for your detector. Modern PADs 
actually sum a bunch of weak images internally before writing them to 
disk. You can do the same "in post" if you want to.


Yes, there are many cases where SAD is good enough, but my advice is 
never to tempt fate.


What I recommend is:
1) collect two wavelengths: remote, and halfway between the peak and 
inflection.

        this will m

Re: [ccp4bb] Experimental phasing Selenomethionine data collection etc. tips

2024-05-15 Thread James Holton

Thank you to all who provided helpful suggestions so far.

A few things I'd recommend for this particular beamline (which I have 
been running for 20+ years)


Do NOT collect one wavelength at a time. This was a good strategy on old 
beamlines with noisy detectors and slow, drifty monochromators. This is 
not the case at any of the ALS beamlines today. With modern 
zero-read-noise detectors there is no penalty to spreading your photons 
over a lot more images, and round-robin changes between at least two 
wavelengths will double your phasing power for the same dose. With 
8.3.1's monochromator, wavelength changes take about 1 second and are 
reproducible to well within the intrinsic width of the Se peak.  So you 
don't need to worry about missing or drifting off the peak or 
inflection. The only thing you need to worry about is over-cooking your 
crystal before you get all the data you need.


No matter what beamline you use the number of photons your crystal will 
give off before it dies is a fixed number. All you get to do is decide 
how to spread them over the images. Doing two wavelengths within this 
photon budget doesn't hurt. You can always scale and merge them 
together. But keeping them separate gives you both kinds of anomalous 
differences, which are 90 deg apart. So, when one zigs the other zags. 
It is like having twice as many sites without the extra damage you would 
get from them. Also, by taking shorter/weaker exposures you maximize 
your chances of winning over radiation damage "in-post" by cutting off 
images that degrade your signal.


And before anybody says it: NO! Collecting fainter images does NOT 
degrade your resolution. I don't know where this idea comes from, but it 
never seems to die. It was true with film and image plates, but with 
pixel arrays and modern CCDs there is no penalty to weak images.  Don't 
believe me? Read the manual for your detector. Modern PADs actually sum 
a bunch of weak images internally before writing them to disk. You can 
do the same "in post" if you want to.


Yes, there are many cases where SAD is good enough, but my advice is 
never to tempt fate.


What I recommend is:
1) collect two wavelengths: remote, and halfway between the peak and 
inflection.

        this will maximize both kinds of anomalous differences
2) calculate your Bijvoet ratio here: 
https://bl831.als.lbl.gov/xtalsize.html
3) convert this into MGy. I.E. if your Bijvoet ratio is 3%, then 3 MGy 
is the max dose to avoid.

4) do a strategy and start at the recommended phi value
5) set delta-phi to be 1/3 of your estimated mosaic spread, or 0.2 deg, 
whichever is lower

    this is all done automatically by the "index" program at 8.3.1
6) set your exposure time to be 0.1 s or more.
    This is because the Pilatus M 6M has a 1 ms read-out and you want 
that to be 1% of the exposure.
7) attenuate the beam so that you will get complete data in less than 
1/2 your Bijvoet ratio in MGy.

    This is handled by the exposure_time program at 8.3.1
8) collect data in inverse beam and round-robin for both wavelengths (45 
deg wedges)

    In BLU-ICE, just enter the wavelengths into the list on the Collect tab
9) keep collecting until you get 360 deg for both wavelengths
10) move the detector up by ~5 mm, this puts the next sphere of spots 
onto new pixels

11) multiply your exposure or de-attenuate by a factor of 4
12) goto 8

When the diffraction image is noticeably damaged, you are done with this 
crystal. If it is bigger than the beam, move to a fresh spot and do this 
again. When the crystal is all burnt up, mount the next one and do this 
again.


If you're lucky, the automatic processing will finish before you mount 
your next crystal and you can try SHELXC/D/E on the 448-core 
shared-memory computer we have for doing such things. I expect it might 
be faster than the cloud.


Sorry if any of this sounds gruff, I don't mean to shout down on anyone, 
but I want the message to be clear. This is something Gerard B and I 
have struggled to communicate for decades:
Collecting one wavelength at a time is not MAD, but rather M-SAD. 
Multiple, non-isomorphous SAD data sets.


-James Holton
MAD Scientist

On 5/13/2024 10:23 PM, dbellini wrote:

Hi Marco,

A few suggestions that I like to follow for MAD experiments:

Before everything, check you have at least about 1 SeMet per 100 residues
Then before crystallisation check by MassSpec that SeMet is properly 
incorporated in your protein
After crystallisation collect first on the peak with (very) high 
redundancy and as little/gentle dose as possible
Collecting the other wavelengths should give you better starting 
phases/maps, which might be very helpful at your resolution of 2.8 
(especially if it is a very anisotropic 2.8...)


Automated pipelines are so good nowadays, if you collect good data 
they should solve it without problems (as long as your crystal is not 
suffering from other pathologies like twinning or

Re: [ccp4bb] Modeling Disulfide Bond Occupancies

2024-05-06 Thread James Holton
In the CCP4 program refmac5, you specify occupancy groups and then 
specify how to refine them.  The setup, afaik, is not automatic.


Documentation for how to do it is here:
http://www.ysbl.york.ac.uk/refmac/data/refmac_keywords.html#Occupancy

You may also find this link helpful?
https://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg51707.html

-James Holton
MAD Scientist

On 5/6/2024 12:19 PM, Liliana Margent wrote:

Greetings everyone,

I'm currently in the process of modeling a disulfide bond in two structures. 
However, when I attempt to model single occupancy for the cysteines involved in 
the bond, negative density blobs emerge within the disulfide bond. This 
suggests the possibility of alternate conformations for the cysteines.

Yet, when I endeavor to model alternate conformations for both cysteines, their 
occupancies do not align despite running refinement with a bond parameter file 
that specifies the link. To illustrate, I initiate the refinement with 
occupancies for Cys472 as A0.75/B0.25 and for C384 as A0.75/B0.25, but 
post-refinement, the output occupancies appear as Cys472 A0.84/B0.16 and for 
C384 A0.97/B0.03. Where A confs participate in the s-s bond.

Has anyone else encountered this issue before, or does anyone have suggestions 
on how to refine these cysteines to achieve coherent occupancies?

Thank you for any insights you can provide.

Best,
Liliana



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Room temperature change from 25ºC to 20ºC

2024-04-01 Thread James Holton

Thank you for spearheading this, Mark,

But, I don't think you are going far enough. It has already been 
expressed by several on this thread that a 5C change is insufficient 
correction for many situations.  I propose a much more productive change 
of lowering "room temperature" to 4 Kelvin.  This would rapidly lead to 
Nobel Prize winning discoveries, such as room-temperature 
superconductors.  I think, in that light, the minor inconvenience of 
being more specific about temperature in our papers is well worth it.


And, as an aside, I hope you don't mind me pointing out that energy 
doesn't always have to come from burning carbon. Carbon doesn't even 
have the highest energy density.  You can see from this table:

https://en.wikipedia.org/wiki/Energy_density_Extended_Reference_Table
that other fuels are much more effective.

HTH,

-James Holton
MAD Scientist

On 4/1/2024 7:29 AM, Mark wrote:

Room temperature change from 25ºC to 20ºC

As a member of the inter-society standards commission St-Incent I have 
been asked to take the bearings of the structural biology community 
regarding a proposal to lower the universally understood room 
temperature from 25ºC (77º Fahrenheit) to 20ºC (68º Fahrenheit). 
Obvious advantages would be less heating necessary for experiments at 
this standard temperature. Given that laboratories nowadays are not 
commonly heated to this high temperature anyway, it does appear to 
make sense.


Members of tropical and subtropical countries have already expressed 
opposition to the proposal, because they have to reach room 
temperature by cooling rather than heating, so for them the proposal 
would mean more CO2 emissions, not less.


Please express opinions to this list today, so that I have time to 
collate them before the local deadline of 28 December.



Mark J van Raaij
Dpto de Estructura de Macromoleculas, lab 20B
Centro Nacional de Biotecnologia - CSIC
calle Darwin 3
E-28049 Madrid, Spain



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] request for applications

2024-04-01 Thread James Holton
Of course, Frank!  No amount is too small if it makes a difference in 
the world.


Can you please provide a budget justification?


On 4/1/2024 1:22 AM, Frank Von Delft wrote:
Oh dear, your prime number oversupply crashed the crypto Ponzi 
schememarket.  Will you accept $10e2 proposals now?


Sent from tiny silly touch screen

*From:* James Holton 
*Sent:* Monday, 1 April 2024 08:01
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* [ccp4bb] request for applications

Hey Everyone,

It may sound like an incredibly boring thing that there has never been a
formal mathematical proof that finding the prime factors of very large
numbers doesn't have a more efficient algorithm than simply trying every
single one of them. Nevertheless, to this day, encryption keys and
indeed blockchain-based cryptocurrencies hinge upon how computationally
hard it is to find these large prime factors. And yet, no one has ever
proven that there is not a more efficient way.

It occurred to me recently that cryptocurrencies (blockchains) are
nothing more than a sequence of numbers, and Large Language Models
fundamentally take a sequence of "words" and predict the next one in the
series. So, they seem naturally suited to the task of finding a more
efficient way. I spent some of my free time trying my hand at this.
There were some twists and turns along the way, but as of today it seems
to be working. Predictions are now coming pretty fast. By the end of
April 1, I expect to have ~ $1e12 USD on current ledgers. This may have
certain socioeconomic ramifications, but that is not what I want to
discuss here. What I want to discuss is how to use this new source of
scientific funding!

My question for the BB is: what would YOU do if you had $1e12 USD for
your science? No non-scientific proposals please. There are plenty of
other forums for those.  This BB is about biological structural science,
so please stay on-topic.  OK?  And now: suggestions!

I am particularly interested in projects that can only be done with a
large, cooperative $1e12 USD, but not by 10e6 independent and unrelated
$100e3 projects. The Apollo moon missions, for example cost $300e9
(adjusted USD).  On a smaller scale, re-doing the whole PDB from cloning
and expression to crystallization and structure solution would only cost
about $500e6 USD. That would finally give us a good database of
crystallization conditions for training an AI to tell you, given a
sequence, what the crystallization conditions (if any) will be. That
might take a lot of computing power, but there is plenty left over to
buy 10 zettaflops of computing power (and the solar panels needed to
power it). Or, if we really want to just divide it up, that would be
$10e6 for each of the ~1e5 people on this planet who fit into the
category of "biological scientist". That's not just PIs, but postdocs,
grad students, techs. Everybody.

I'm sure this will solve a lot of problems, but not all of them. And, I
like to get ahead of things. So, what are the non-financial problems
that will remain?  I think these are the most important problems in
science: the intellectual and technological hurdles that money can't
overcome.  I'm hoping this will be an opportunity for all of us to focus
on those.  I know we're all not used to thinking on this scale, but, at
least for today, let's give it a try!

Looking forward to your applications,

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/

--------
*From:* James Holton 
*Sent:* Monday, 1 April 2024 08:01
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* [ccp4bb] request for applications

Hey Everyone,

It may sound like an incredibly boring thing that there has never been a
formal mathematical proof that finding the prime factors of very large
numbers doesn't have a more efficient algorithm than simply trying every
single one of them. Nevertheless, to this day, encryption keys and
indeed blockchain-based cryptocurrencies hinge upon how computationally
hard it is to find these large prime factors. And yet, no one has ever
proven that there is not a more efficient way.

It occurred to me recently that cryptocurrencies (blockchains) are
nothing more than a sequence of numbers, and Large Language Models
fundamentally take a sequence of "words" and predict the next one in the
series. So, they seem naturally suited to the task of finding a more
efficient way. I spent some of my free time trying my hand at this.
There were some twists and turns along the way, but 

Re: [ccp4bb] request for applications

2024-04-01 Thread James Holton
I'm sorry Phil, but your application has been administratively rejected 
because it did not conform to the bioscience-only stipulation that was 
clearly stated in the RFA.


We look forward to an improved version of your proposal in the future, 
and please try to read the instructions more carefully next time.


Best of luck,

-James Holton
MAD Scientist

On 4/1/2024 8:03 AM, Phil Jeffrey wrote:

:: I expect to have ~ $1e12 USD on current ledgers.

Presumably via the Bankman-Fried algorithm

Phil

On 4/1/24 3:01 AM, James Holton wrote:

Hey Everyone,

It may sound like an incredibly boring thing that there has never 
been a formal mathematical proof that finding the prime factors of 
very large numbers doesn't have a more efficient algorithm than 
simply trying every single one of them. Nevertheless, to this day, 
encryption keys and indeed blockchain-based cryptocurrencies hinge 
upon how computationally hard it is to find these large prime 
factors. And yet, no one has ever proven that there is not a more 
efficient way.



[snip]






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] request for applications

2024-04-01 Thread James Holton

For you, Eleanor? Of course!  I look forward to it.

But do you have an "elevator pitch"?

I feel that a lively exchange of short messages conveys ideas much more 
efficiently and effectively than an annual exchange of hyper-dense 
documents.


Cheers,

-James Holton
MAD Scientist

On 4/1/2024 6:27 AM, Eleanor Dodson wrote:
It. Will probably take me  a. Full year to draft the. Application - is 
that too slow?


On Mon, 1 Apr 2024 at 09:22, Frank Von Delft 
<bcb385fe5582-dmarc-requ...@jiscmail.ac.uk> wrote:


Oh dear, your prime number oversupply crashed the crypto Ponzi
schememarket.  Will you accept $10e2 proposals now?

Sent from tiny silly touch screen
----
    *From:* James Holton 
*Sent:* Monday, 1 April 2024 08:01
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* [ccp4bb] request for applications

Hey Everyone,

It may sound like an incredibly boring thing that there has never
been a
formal mathematical proof that finding the prime factors of very
large
numbers doesn't have a more efficient algorithm than simply trying
every
single one of them. Nevertheless, to this day, encryption keys and
indeed blockchain-based cryptocurrencies hinge upon how
computationally
hard it is to find these large prime factors. And yet, no one has
ever
proven that there is not a more efficient way.

It occurred to me recently that cryptocurrencies (blockchains) are
nothing more than a sequence of numbers, and Large Language Models
fundamentally take a sequence of "words" and predict the next one
in the
series. So, they seem naturally suited to the task of finding a more
efficient way. I spent some of my free time trying my hand at this.
There were some twists and turns along the way, but as of today it
seems
to be working. Predictions are now coming pretty fast. By the end of
April 1, I expect to have ~ $1e12 USD on current ledgers. This may
have
certain socioeconomic ramifications, but that is not what I want to
discuss here. What I want to discuss is how to use this new source of
scientific funding!

My question for the BB is: what would YOU do if you had $1e12 USD for
your science? No non-scientific proposals please. There are plenty of
other forums for those.  This BB is about biological structural
science,
so please stay on-topic.  OK?  And now: suggestions!

I am particularly interested in projects that can only be done with a
large, cooperative $1e12 USD, but not by 10e6 independent and
unrelated
$100e3 projects. The Apollo moon missions, for example cost $300e9
(adjusted USD).  On a smaller scale, re-doing the whole PDB from
cloning
and expression to crystallization and structure solution would
only cost
about $500e6 USD. That would finally give us a good database of
crystallization conditions for training an AI to tell you, given a
sequence, what the crystallization conditions (if any) will be. That
might take a lot of computing power, but there is plenty left over to
buy 10 zettaflops of computing power (and the solar panels needed to
power it). Or, if we really want to just divide it up, that would be
$10e6 for each of the ~1e5 people on this planet who fit into the
category of "biological scientist". That's not just PIs, but
postdocs,
grad students, techs. Everybody.

I'm sure this will solve a lot of problems, but not all of them.
And, I
like to get ahead of things. So, what are the non-financial problems
that will remain?  I think these are the most important problems in
science: the intellectual and technological hurdles that money can't
overcome.  I'm hoping this will be an opportunity for all of us to
focus
on those.  I know we're all not used to thinking on this scale,
but, at
least for today, let's give it a try!

Looking forward to your applications,

-James Holton
MAD Scientist




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

This message was issued to members of www.jiscmail.ac.uk/CCP4BB
<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by
www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions
are available at https://www.jiscmail.ac.uk/policyandsecurity/
--------
*From:* James Holton 
*Sent:* Monday, 1 April 2024 08:01
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* [ccp4bb] request for applications

Hey Everyone,

It may sound like an incredibly boring thi

[ccp4bb] request for applications

2024-04-01 Thread James Holton

Hey Everyone,

It may sound like an incredibly boring thing that there has never been a 
formal mathematical proof that finding the prime factors of very large 
numbers doesn't have a more efficient algorithm than simply trying every 
single one of them. Nevertheless, to this day, encryption keys and 
indeed blockchain-based cryptocurrencies hinge upon how computationally 
hard it is to find these large prime factors. And yet, no one has ever 
proven that there is not a more efficient way.


It occurred to me recently that cryptocurrencies (blockchains) are 
nothing more than a sequence of numbers, and Large Language Models 
fundamentally take a sequence of "words" and predict the next one in the 
series. So, they seem naturally suited to the task of finding a more 
efficient way. I spent some of my free time trying my hand at this. 
There were some twists and turns along the way, but as of today it seems 
to be working. Predictions are now coming pretty fast. By the end of 
April 1, I expect to have ~ $1e12 USD on current ledgers. This may have 
certain socioeconomic ramifications, but that is not what I want to 
discuss here. What I want to discuss is how to use this new source of 
scientific funding!


My question for the BB is: what would YOU do if you had $1e12 USD for 
your science? No non-scientific proposals please. There are plenty of 
other forums for those.  This BB is about biological structural science, 
so please stay on-topic.  OK?  And now: suggestions!


I am particularly interested in projects that can only be done with a 
large, cooperative $1e12 USD, but not by 10e6 independent and unrelated 
$100e3 projects. The Apollo moon missions, for example cost $300e9 
(adjusted USD).  On a smaller scale, re-doing the whole PDB from cloning 
and expression to crystallization and structure solution would only cost 
about $500e6 USD. That would finally give us a good database of 
crystallization conditions for training an AI to tell you, given a 
sequence, what the crystallization conditions (if any) will be. That 
might take a lot of computing power, but there is plenty left over to 
buy 10 zettaflops of computing power (and the solar panels needed to 
power it). Or, if we really want to just divide it up, that would be 
$10e6 for each of the ~1e5 people on this planet who fit into the 
category of "biological scientist". That's not just PIs, but postdocs, 
grad students, techs. Everybody.


I'm sure this will solve a lot of problems, but not all of them. And, I 
like to get ahead of things. So, what are the non-financial problems 
that will remain?  I think these are the most important problems in 
science: the intellectual and technological hurdles that money can't 
overcome.  I'm hoping this will be an opportunity for all of us to focus 
on those.  I know we're all not used to thinking on this scale, but, at 
least for today, let's give it a try!


Looking forward to your applications,

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] program to complete (or to change) side chains to specific rotamers

2024-02-06 Thread James Holton

Hey Jorge,

Did you know coot can be scripted and run without the gui?  In general, 
all you need is to record a session and then edit the session file.  A 
shell script for doing something similar to what you want would look 
like this:


#! /bin/tcsh -f
#
#
set pdbfile = wrong.pdb
set mapmtz = refmacout.mtz
set chain = A
set resnum = 123
set TYP = LEU

cat << EOF >! mutate.py
imol_coords = handle_read_draw_molecule("$pdbfile")
imol_map = make_and_draw_map("$mapmtz","FWT","PHWT","",0,0)
set_go_to_atom_chain_residue_atom_name("${chain}",$resnum," CA ")
mutate_and_auto_fit(${resnum},"${chain}",imol_coords,imol_map,"${TYP}")
with_auto_accept([sphere_refine, 3.5])
save_coordinates(0,"coot.pdb")
coot_no_state_real_exit(0)
EOF

# run the program
coot --no-graphics --script mutate.py

A difference here is that instead of picking a particular rotamer, you 
let coot do it for you based on the density.


This might sound like a heavyweight approach, but coot actually spins up 
real fast when it doesn't need to build a graphics window.


I note you did ask for a way to specify a rotamer.  I say try doing that 
in coot and record the session.   However, if that gets too complicated, 
another way to do build-by-rotamer is with these awk programs that I 
wrote long before coot was a thing.  I still find them useful for 
certain tasks. I have created a git repo for them here:

https://github.com/jmholton/build_pdb

HTH,

-James Holton
MAD Scientist



On 2/6/2024 8:24 AM, Jorge Iulek wrote:

Dear all,


As I cannot find by myself, maybe someone can indicate here.
I would like a program, command line - not gui, that I would 
simply indicate residues and the desired rotamer, and so it will 
change the coordinate file to complete/change side chains accordingly, 
say, somehow like I say for Glu251 make it rotamer tp10, and so on.
Less desirable would be to indicate a reference structure for the 
specific rotamer. But I would really prefer that I indicate the 
specific residue and the rotamer I want.

Is there such a program or any ideas for a combination of programs?
Thanks,

Jorge



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Introducing the UNTANGLE Challenge

2024-01-21 Thread James Holton

Thank you Herbert,

Yes, I call tunneling the "conformer swap trick", and I provide a jiffy 
script for doing this. Swapping conformer letter assignments is 
equivalent to changing the color of the ropes at a certain distance down 
their length.  When you give this to the refinement program all the 
bonds are created between same-color bits of rope only, so you 
effectively tunnel.


The hard part is deciding which bits need to change color.   In my 
simple diagram it is easy, as the point of maximum stress is also the 
point where corrective action needs to be taken.  This is equivalent to 
"Level 1" of the UNTANGLE Challenge.  At Level 2 the atoms that need to 
be swapped are not the most strained, but nearby.  At Level 3 there are 
several groups of atoms that need to be swapped, but I can't do it 
myself without cheating because I already know what they are.


At Level 9 the atoms that need swapping are in large, connected groups. 
This type of correlated motion is probably the most biologically 
interesting.


Good news is: every "wrong interpretation" of the correlated motions 
that I have been able to contrive has markedly more strain than the 
ground truth. This implies the "right interpretation" of correlated 
motion is recognizable and provable. I find that motivating.


-James Holton
MAD Scientist

On 1/21/2024 4:07 AM, Herbert J. Bernstein wrote:
Have you considered the impact of tunneling?  Your rope crossings are 
not perfect barriers.


On Sat, Jan 20, 2024 at 6:09 PM James Holton  wrote:

Update:

I've gotten some feedback asking for clarity on what I mean by
"tangled". I paste here a visual aid:


The protein chains in an ensemble model are like these ropes. If
these ropes are the same length as the distance from floor to
ceiling, then straight up-and-down is the global minimum in energy
(left). The anchor points are analogous to the rest of the protein
structure, which is the same in both diagrams. Imagine for a
moment, however, after anchoring the dangling rope ends to the
floor you look up and see the ropes are actually crossed (right).
You got the end points right, but no amount of pulling on the
ropes (energy minimization) is going to get you from the tangled
structure to the global minimum. The tangled ropes are also
strained, because they are being forced to be a little longer than
they want to be. This strain in protein models manifests as
geometry outliers and the automatic weighting in your refinement
program responds to bad geometry by relaxing the x-ray weight,
which alleviates some of the strain, but increases your Rfree.

The goal of this challenge is to eliminate these tangles, and do
it efficiently. What we need is a topoisomerase! Something that
can find the source of strain and let the ropes pass through each
other at the appropriate place. I've always wanted one of those
for the wires behind my desk...

More details on the origins of tangling in ensemble models can be
found here:
https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#tangle

-James Holton
    MAD Scientist

On 1/18/2024 4:33 PM, James Holton wrote:

Greetings Everybody,

I present to you a Challenge.

Structural biology would be far more powerful if we can get our
models out of local minima, and together, I believe we can find a
way to escape them.

tldr: I dare any one of you to build a model that scores better
than my "best.pdb" model below. That is probably impossible, so I
also dare you to approach or even match "best.pdb" by doing
something more clever than just copying it. Difficulty levels
range from 0 to 11. First one to match the best.pdb energy score
an Rfree wins the challenge, and I'd like you to be on my paper.
You have nine months.

Details of the challenge, scoring system, test data, and
available starting points can be found here:
https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/

Why am I doing this?
We all know that macromolecules adopt multiple conformations.
That is how they function. And yet, ensemble refinement still has
a hard time competing with conventional
single-conformer-with-a-few-split-side-chain models when it comes
to revealing correlated motions, or even just simultaneously
satisfying density data and chemical restraints. That is,
ensembles still suffer from the battle between R factors and
geometry restraints. This is because the ensemble member chains
cannot pass through each other, and get tangled. The tangling
comes from the density, not the chemistry. Refinement in refmac,
shelxl, phenix, simulated annealing, qFit, and even coot cannot
untangle them.

The good news is: knowledge of chemistry, combined with R
factors, appears to be a powerful indicator 

Re: [ccp4bb] Introducing the UNTANGLE Challenge

2024-01-20 Thread James Holton

Update:

I've gotten some feedback asking for clarity on what I mean by 
"tangled". I paste here a visual aid:



The protein chains in an ensemble model are like these ropes. If these 
ropes are the same length as the distance from floor to ceiling, then 
straight up-and-down is the global minimum in energy (left). The anchor 
points are analogous to the rest of the protein structure, which is the 
same in both diagrams. Imagine for a moment, however, after anchoring 
the dangling rope ends to the floor you look up and see the ropes are 
actually crossed (right). You got the end points right, but no amount of 
pulling on the ropes (energy minimization) is going to get you from the 
tangled structure to the global minimum. The tangled ropes are also 
strained, because they are being forced to be a little longer than they 
want to be. This strain in protein models manifests as geometry outliers 
and the automatic weighting in your refinement program responds to bad 
geometry by relaxing the x-ray weight, which alleviates some of the 
strain, but increases your Rfree.


The goal of this challenge is to eliminate these tangles, and do it 
efficiently. What we need is a topoisomerase! Something that can find 
the source of strain and let the ropes pass through each other at the 
appropriate place.  I've always wanted one of those for the wires behind 
my desk...


More details on the origins of tangling in ensemble models can be found 
here:

https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#tangle

-James Holton
MAD Scientist

On 1/18/2024 4:33 PM, James Holton wrote:

Greetings Everybody,

I present to you a Challenge.

Structural biology would be far more powerful if we can get our models 
out of local minima, and together, I believe we can find a way to 
escape them.


tldr: I dare any one of you to build a model that scores better than 
my "best.pdb" model below. That is probably impossible, so I also dare 
you to approach or even match "best.pdb" by doing something more 
clever than just copying it. Difficulty levels range from 0 to 11. 
First one to match the best.pdb energy score an Rfree wins the 
challenge, and I'd like you to be on my paper. You have nine months.


Details of the challenge, scoring system, test data, and available 
starting points can be found here:

https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/

Why am I doing this?
We all know that macromolecules adopt multiple conformations. That is 
how they function. And yet, ensemble refinement still has a hard time 
competing with conventional 
single-conformer-with-a-few-split-side-chain models when it comes to 
revealing correlated motions, or even just simultaneously satisfying 
density data and chemical restraints. That is, ensembles still suffer 
from the battle between R factors and geometry restraints. This is 
because the ensemble member chains cannot pass through each other, and 
get tangled. The tangling comes from the density, not the chemistry. 
Refinement in refmac, shelxl, phenix, simulated annealing, qFit, and 
even coot cannot untangle them.


The good news is: knowledge of chemistry, combined with R factors, 
appears to be a powerful indicator of how near a model is to being 
untangled. What is really exciting is that the genuine, underlying 
ensemble cannot be tangled. The true ensemble _defines_ the density; 
it is not being fit to it. The more untangled a model gets the closer 
it comes to the true ensemble, with deviations from reasonable 
chemistry becoming easier and easier to detect. In the end, when all 
alternative hypotheses have been eliminated, the model must match the 
truth.


Why can't we do this with real data? Because all ensemble models are 
tangled. Let's get to untangling them, shall we?


To demonstrate, I have created a series of examples that are 
progressively more difficult to solve, but the ground truth model and 
density is the same in all cases. Build the right model, and it will 
not only explain the data to within experimental error, and have the 
best possible validation stats, but it will reveal the true, 
underlying cooperative motion of the protein as well.


Unless, of course, you can prove me wrong?

-James Holton
MAD Scientist






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Introducing the UNTANGLE Challenge

2024-01-18 Thread James Holton

Greetings Everybody,

I present to you a Challenge.

Structural biology would be far more powerful if we can get our models 
out of local minima, and together, I believe we can find a way to escape 
them.


tldr: I dare any one of you to build a model that scores better than my 
"best.pdb" model below. That is probably impossible, so I also dare you 
to approach or even match "best.pdb" by doing something more clever than 
just copying it. Difficulty levels range from 0 to 11. First one to 
match the best.pdb energy score an Rfree wins the challenge, and I'd 
like you to be on my paper. You have nine months.


Details of the challenge, scoring system, test data, and available 
starting points can be found here:

https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/

Why am I doing this?
We all know that macromolecules adopt multiple conformations. That is 
how they function. And yet, ensemble refinement still has a hard time 
competing with conventional single-conformer-with-a-few-split-side-chain 
models when it comes to revealing correlated motions, or even just 
simultaneously satisfying density data and chemical restraints. That is, 
ensembles still suffer from the battle between R factors and geometry 
restraints. This is because the ensemble member chains cannot pass 
through each other, and get tangled. The tangling comes from the 
density, not the chemistry. Refinement in refmac, shelxl, phenix, 
simulated annealing, qFit, and even coot cannot untangle them.


The good news is: knowledge of chemistry, combined with R factors, 
appears to be a powerful indicator of how near a model is to being 
untangled. What is really exciting is that the genuine, underlying 
ensemble cannot be tangled. The true ensemble _defines_ the density; it 
is not being fit to it. The more untangled a model gets the closer it 
comes to the true ensemble, with deviations from reasonable chemistry 
becoming easier and easier to detect. In the end, when all alternative 
hypotheses have been eliminated, the model must match the truth.


Why can't we do this with real data? Because all ensemble models are 
tangled. Let's get to untangling them, shall we?


To demonstrate, I have created a series of examples that are 
progressively more difficult to solve, but the ground truth model and 
density is the same in all cases. Build the right model, and it will not 
only explain the data to within experimental error, and have the best 
possible validation stats, but it will reveal the true, underlying 
cooperative motion of the protein as well.


Unless, of course, you can prove me wrong?

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Automated refinement convergence

2024-01-18 Thread James Holton

Hey there Robert,

Refmac has a keyword called "kill" that I think is what you are looking 
for.  It is documented here:

https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/refmac_keywords.html

  You can specify a conditional exit based on R factor, etc. Or you can 
just create a specified file containing "stop Y" from an external 
process.  I use it when running refmac on a cluster that has run time 
limits but difficult-to-predict CPU speeds.


Phenix, I don't think has a checkpointing feature. Not that I know of.

Amber does support checkpointing and now counts as a refinement program 
since support for structure factor restraints was added in 22.


Personally, when I do refinements I do dozens to hundreds of 
macro-macro-cycles. As in, take the pdb file output by one run and feed 
it into another run. There is an instantiation overhead to doing this, 
as you note, but I like my models to be super converged. I define 
convergence as the x,y,z,B and occ values in the pdb file are not 
changed by the refinement program. This does not happen quickly, but it 
does eventually happen. Yes, you can get oscillations, but one way to 
deal with those is to add a bit more damping, or to adjust the x-ray 
weight down and then up and then back to auto again. This "weight snap" 
tends to take things that were dangling from a cliff in the energy 
landscape and knock them to the ground. After that, the oscillations are 
less common.


 And like an equilibrated chromatography column, an xyz-converged model 
is the best way to know that when you edit and re-refine, everything you 
see is due to the edit, and not some other process that just wasn't 
finished yet.


That's what I do. Maybe I just want to feel like I've got something 
cooking while I sleep...


Cheers,

-James Holton
MAD Scientist

On 1/18/2024 3:04 AM, Robert Oeffner wrote:

Hi,

I am wondering if authors of refinement programs would like to consider putting 
on their users wish list the ability of refinement programs to automatically 
terminate once the refinement has reached convergence. Various refinement 
metrics such as R factors, CC or RMS values typically will reach a plateau once 
the refinement of a macromolecular structure with X-ray or EM-data has 
converged and further macro-cycles of refinement will no longer improve the 
structure. The default number of macro-cycles in programs such as Phenix-refine 
and Refmac are probably sensible for most cases but in some cases it would be 
nice if the programs automatically extended the number of macro-cycles as 
needed (or decreased the number).

The user can of course examine log files from refinement themselves and decide 
whether to continue refinement. But since starting a new session of refinement 
appears to always create an initial fluctuation in the refinement metrics 
before they align with the values of the last macro-cycles in the previous 
refinement session, the user is compelled to do at least, say 3 or more 
macrocycles in addition to whatever may be needed for reaching convergence. I 
guess it would therefore be more efficient if this was implemented directly in 
the refinement programs and presented as an option for the user to choose.

There could be cases where alternate conformations of a structure will 
repeatedly be oscillating in and out of density thus causing the refinement 
metrics also to oscillate. Hopefully such cases could be covered by gauging the 
level of fluctuations of the refinement metrics and terminate the refinement 
accordingly.

Many thanks,

Robert



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] what is isomorphous?

2023-12-31 Thread James Holton
based metrics of "isomorphism" with a grain of salt.


It has already been pointed out that a pure scaling cell deformation 
(one that preserves all the fractional coordinates of all the atoms) 
does not change the structure factors. I would call such a pair of 
crystals isomorphous.


The origin of the cell-based rule of thumb quoted in Drenth is indeed 
the 1956 paper by Crick and Magdoff that John Cooper shared. But I must 
stress: their calculation, while groundbreaking, was incredibly 
simplistic. It was equivalent to changing the header of a PDB file to a 
different unit cell, leaving all the atoms at the same orthogonal x,y,z 
positions without regard for crystal packing and non-bond clashes. The 
non-physical-ness of this approach is perhaps why noone has ever 
re-visited it.  It is also maximally pessimistic, as real crystals are 
no doubt somewhere in between the harshly rigid approximation of Crick & 
Magdoff and the perfectly soft elasticity that yields no change in 
structure factors at all.


  To be fair, I suspect the computer used to do these calculations was 
named Beatrice Magdoff. That is, in 1956 a "computer" was a job 
description, not a device. Magdoff did some amazing things in her 
career, and this one was no doubt a lot of work.  I don't blame her and 
Crick for trying to keep it simple. I would have done the same. I also 
suspect Magdoff would agree that computers in 2024 are a bit more 
powerful than the fastest computers of 1956.


I expect in the coming year that barriers like non-isomorphism will 
start to be overcome. No doubt borrowing from our cryo-EM friends who 
have been stretching, pulling and sharpening 3D images for decades.


Happy New Year everyone!

-James Holton
MAD Scientist

On 12/21/2023 11:37 AM, Tom Peat wrote:

Hello All,

I think Randy makes a very good point here- it depends on what you are 
trying to do with your data sets.
If you are trying to merge them, 'isomorphous' is important for this 
to work. If you are using them for cross crystal averaging, being less 
isomorphous is better (more signal).


James Holton has a story of Louise Johnson collecting data on lysozyme 
(back in the 60's?) where she looked at one specific reflection to 
determine whether the data sets she was collecting would be 
isomorphous and scale. It turns out that although the cell was very 
similar, the dehydration state of the crystal was very important for 
two lysozyme data sets to scale together. The Rmerge for the two 
dehydration states was something crazy large, like 44%, even though 
under the standard 'rules' (more rules of thumb), one would have 
believed that these data sets should have been 'isomorphous'. For the 
data sets that had the same dehydration state, the data merged with 
'typical' statistics of lysozyme (like 3-4%).


James will have the details that I do not.
cheers, tom

*From:* CCP4 bulletin board  on behalf of Randy 
John Read 

*Sent:* Thursday, December 21, 2023 10:53 PM
*To:* CCP4BB@JISCMAIL.AC.UK 
*Subject:* Re: [ccp4bb] what is isomorphous?
[You don't often get email from rj...@cam.ac.uk. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]


I think we’ve strayed a bit from Doeke’s original question involving 
crystals A, B and C, where I think the consensus opinion would be that 
we would refer to crystal C as not being isomorphous to either A or B.


On the question of what “isomorphous” means in the context of related 
crystals, I’m not sure we have complete consensus. I would tend to say 
that any two crystals are isomorphous if they have related unit cells 
and similar fractional coordinates of the atoms, so that 
(operationally) their diffraction patterns are correlated. However, 
there might be differences of opinion on whether two crystals can be 
considered isomorphous if one has exact crystallographic symmetry and 
the other has pseudosymmetry. (I would probably be on the more 
permissive side here.)


In principle, I suppose being isomorphous (“same shape”) should be a 
binary decision, but in practice we’re interested in the implications 
of the degree to which perfect isomorphism is violated. So I would 
tend to use the term “poorly isomorphous” for a pair where the 
correlation between the diffraction patterns drops off well before the 
resolution limit. Crick was focused on percentage change in cell 
dimensions, but Bernhard is right that what matters is the ratio 
between the difference in cell lengths and the resolution of the data. 
It’s a bit counter-intuitive, but the effect of the difference between 
cell edges of 20 and 25 is the same as for cell edges of 200 and 205! 
By the way, the first time I learned this was from K. Cowtan and I 
hadn’t realised it’s also in Jan Drenth’s book.


For isomorphous replacement (something some of us dimly remember from 
the days before AlphaFold), being poorly isomorph

Re: [ccp4bb] nearestcell

2023-12-27 Thread James Holton

Are you thinking of "othercell" ?

On 12/27/2023 10:38 AM, Kay Diederichs wrote:

Dear all,

I seem to remember a tool called "nearestcell", a command-line equivalent of 
the Oxford Nearest-Cell web server which appears to be offline.
However I cannot locate that tool in CCP4 nor elsewhere. Can anyone point me to 
it or give alternatives, please? (I did try the PDB's advanced search but it is 
not made for this purpose)

Thanks, and a happy and successful 2024 to everybody!

Kay



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] solvent mask for partial ligands, addendum

2023-12-14 Thread James Holton
You guessed right.  Bulk solvent does not get a localized "occupancy". 
It is set to "0" anywhere near modeled atoms, even if those atoms have 
an occupancy of 0.01.  Yes, it might seem sensible to do "occupancy" for 
the bulk, but in practice it is tricky.  I tried my hand at a "fuzzy" 
bulk solvent mask, which can be read in by refmac. It tends to perform 
better than the default masks, but is rather computationally intensive 
to make. Script here:

https://github.com/bl831/fuzzymask

This is perhaps a closer approximation to what you are thinking a bulk 
solvent mask should be?


You might also want to try the Babinet inverse type of bulk solvent 
model. refmac supports this if you use "scale type bulk".


-James Holton
MAD Scientist

On 12/14/2023 2:40 AM, Palm, Gottfried wrote:
I can only guess that the solvent at the place of the EPE is set to 0 
(instead of 0.5)




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] happy/sad maps

2023-04-30 Thread James Holton

Thanks to all who replied both on- and off-list.

A few (such as Charles below) have reported that using general-purpose 
FFT algorithms, such as those available in Matlab or NumPy do, in fact, 
get the smiley face map back.  This is expected because such algorithms 
do not have a resolution cutoff.  The programs I have found do turn the 
smile into a frown are:

phenix.fft and phenix.map_to_structure_factors in Phenix
"fft", "sfall" and "refmac5" in the CCP4 Suite

The reason why this happens is not that these programs are "buggy", but 
simply because they require the array of hkls in reciprocal space to be 
a sphere, not a cube.  That is, if h,k,l = 0,0,15 is at the resolution 
limit, then h,k,l = 15,15,15 is going to be outside of it, and 
discarded. The smiley face consists entirely of these high-order 
structure factors.  Rejecting these "corner" regions may seem like a 
good idea until you realize that they are the only aspect of the map 
that carries happiness. ;)


But all joking aside, the reason why these programs reject the "corners" 
is because they "should" be zero.  And as long as you only ever work 
with maps that are generated from structure factors they will be zero.  
If they are not zero, why?  And what happens if you rotate the object 
before recording it?  Those non-zero "corners" will then be out beyond 
the faces of the cube, and therefore folding back into the 
lower-resolution data, messing it up.  This implies, of course, that the 
non-rotated object is also being messed up by noise beyond the Nyquist 
limit.  What I am suggesting is that non-zero "corner" regions may be a 
canary in our coal mine.


Paul introduced the interesting possibility of super-sampling, and what 
I'm really trying to get at here is: what is the best way to "feather" 
an all-or-nothing real-space map (such as a bulk solvent mask, or a 
single-pixel detector event), so that it does not spray noise all over 
reciprocal space?


Many of you may think this is a trick question and that I already have a 
beautiful answer I'm waiting to reveal.  That is not really the case.  
What I have is a handful of rather unsatisfactory solutions, and I'm 
wondering if I'm just not aware of an existing solution to this problem.


Hope everyone had a lovely weekend,

-James Holton
MAD Scientist


On 4/28/2023 1:35 PM, Sindelar, Charles wrote:


Hi James!  I’m not sure exactly what cpp4, coot and phenix do for 
their FFT’s, but I’m assuming they must mask off the ‘corners’ of the 
3D FFT data cube transform (where the frequency is greater than the 
Nyquist) during the forwards transform.


If you take your Cheshire map and do the FFT in matlab or octave, this 
is not done and I can confirm you get the smiley back. The cryo-EM 
software I am familiar with does filter away the FFT corners during 
the first 3D reconstruction step, so bypassing this issue.


Fun demo- this would be a great exercise for students!

Chuck

*From: *CCP4 bulletin board  on behalf of James 
Holton 

*Date: *Friday, April 28, 2023 at 11:49 AM
*To: *CCP4BB@JISCMAIL.AC.UK 
*Subject: *[ccp4bb] happy/sad maps

Its still April, but this one isn't a joke.

The smiley-face electron density in the left panel of the attached image
has the remarkable property that any attempt to sharpen or blur the map
turns it into the frowny-face on the right.  If you'd like to try this
yourself, the hidden_frown.map file is available in this tarball:
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbl831.als.lbl.gov%2F~jamesh%2Fbugreports%2Ffft_042423.tgz=05%7C01%7Ccharles.sindelar%40YALE.EDU%7Cc059178a4b774611f6ec08db48001b68%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C638182937669477928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=jl16hGjaqmChVOHWb%2FAvgtvPTVFOt06wBqBYoxPk%2FZw%3D=0 
<https://bl831.als.lbl.gov/~jamesh/bugreports/fft_042423.tgz>


In fact, any use of an FFT, even with the sharpening B set to zero,
turns the smiley into a frowny face. There is no way to get the smiley
face back (except opening the file again).  Yes, that's right, even just
a simple back-and-forth FFT: turning this hidden_frown.map into
structure factors and then back into a map again, gives you a frowny
face.  This happens using coot, ccp4 and phenix.

Wait, what!?  Isn't a Fourier transform supposed to preserve
information? As in: you can jump back and forth between real and
reciprocal space with impunity? Without introducing error?  Well, yes,
it is SUPPOSED to work like that, but the 3D FFT algorithms of
structural biology have a ... quirk. If you start with structure factors
and make a map out of them, you can convert it back-and-forth as often
as you want with 100% preservation of information. However, if you
start with a real-space map (such as from cryoEM), a back-and-forth
conversion gives you a dif

Re: [ccp4bb] happy/sad maps

2023-04-30 Thread James Holton

Thank you Paul.  This is interesting!

I have not played with super-sampling yet.  I am assuming you mean 
creating a new map 8x the size? If so, did you fill the interstitial 
grid with zeroes? Local maximum? Linear interpolation?  Tricubic spline?


And when you say "sharpen/blur" with a factor of 4. Is 4 a scale factor? 
or a B factor?


Cheers, and I hope you had a pleasant weekend,

-James

On 4/28/2023 5:29 PM, Paul Emsley wrote:

What fun!

super-sampled by a factor of 2 then sharpen/blur with a factor of 4 
gives a superposition


Paul.





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] happy/sad maps

2023-04-28 Thread James Holton
 
structural biology methods?


My question for the BB:  can someone explain how Nyquist folding is 
handled in cryoEM data processing?


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Structure prediction - waiting to happen

2023-04-14 Thread James Holton
e intensities because publication requirements for 
chemical crystallography R factors are low enough to be dominated by 
experimental noise only.  Nevertheless, despite the phase problem being 
cracked by direct methods in the 1980s, your local chemistry department 
has yet to shut down their diffractometer. Why? Because they need it. 
And for macromolecular structures, the systematic errors between refined 
coordinates and their corresponding data are about 4-5x larger than 
experimental error. So, don't delete your image data! Not for a while yet.


-James Holton
MAD Scientist


On 4/1/2023 7:57 AM, Subramanian, Ramaswamy wrote:

Ian,

Thank you.  This is not an April fools..
Rams
subra...@purdue.edu




On Apr 1, 2023, at 10:46 AM, Ian Tickle  wrote:

 *External Email*: Use caution with attachments, links, or 
sharing data 




Hi Ramaswamy

I assume this is an April Fool's but it's still a serious question 
because many reviewers who are not crystallographers or electron 
microscopists may not fully appreciate the difference currently 
between the precision of structures obtained by experimental and 
predictive methods, though the latter are certainly catching up.  The 
answer of course lies in the mean co-ordinate precision, related to 
the map resolution.


Quoting https://people.cryst.bbk.ac.uk/~ubcg05m/precgrant.html :

"The accuracy and precision required of an experimentally determined 
model of a macromolecule depends on the biological questions being 
asked of the structure.  Questions involving the overall fold of a 
protein, or its topological similarity to other proteins, can be 
answered by structures of fairly low precision such as those obtained 
from very low resolution X-ray crystal diffraction data [or 
AlphaFold]. Questions involving reaction mechanisms require much 
greater accuracy and precision as obtained from well-refined, 
high-resolution X-ray structures, including proper statistical 
analyses of the standard uncertainties (/s.u.'s/) of atomic positions 
and bond lengths.".


According to https://www.nature.com/articles/s41586-021-03819-2 :

The accuracy of AlphaFold structures at the time of writing (2021) 
was around 1.0 Ang. RMSD for main-chain and 1.5 Ang. RMSD for 
side-chain atoms and probably hasn't changed much since.  This is 
described as "highly accurate"; however this only means that 
AlphaFold's accuracy is much higher in comparison with other 
prediction methods, not in comparison with experimental methods.  
Also note that AlphaFold's accuracy is estimated by comparison with 
the X-ray structure which remains the "gold standard"; there's no way 
(AFAIK) of independently assessing AlphaFold's accuracy or precision.


Quoting https://scripts.iucr.org/cgi-bin/paper?S0907444998012645 :

"Data of 0.94 A resolution for the 237-residue protein concanavalin A 
are used in unrestrained and restrained full-matrix inversions to 
provide standard uncertainties sigma(r) for positions and sigma(l) 
for bond lengths. sigma(r) is as small as 0.01 A for atoms with low 
Debye B values but increases strongly with B."


There's a yawning gap between 1.0 - 1.5 Ang. and 0.01 Ang.!  Perhaps 
AlphaFold structures should be deposited using James Holton's new PDB 
format (now that is an April Fool's !).


One final suggestion for a reference in your grant application: 
https://www.biorxiv.org/content/10.1101/2022.03.08.483439v2 .


Cheers

-- Ian


On Sat, 1 Apr 2023 at 13:06, Subramanian, Ramaswamy 
 wrote:


Dear All,

I am unsure if all other groups will get it - but I am sure this
group will understand the frustration.

My NIH grant did not get funded.  A few genuine comments - they
make excellent sense. We will fix that.

One major comment is, “Structures can be predicted by alpfafold
and other software accurately, so the effort put on the grant to
get structures by X-ray crystallography/cryo-EM is not justified.”

The problem is when a company with billions of $$s develops a
method and blasts it everywhere - the message is so pervasive…

*Question: I*s there a canned consensus paragraph that one can
add with references to grants with structural biology (especially
if the review group is not a structural biology group) to say why
the most modern structure prediction programs are not a
substitute for structural work?

Thanks.


Rams
subra...@purdue.edu






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi

Re: [ccp4bb] new PDB file format

2023-04-03 Thread James Holton

Thanks to everyone for being such good sports!

It is good to know that there is still room for good-natured funny in 
what can be stressful times.


Truth be told, I actually did do some experiments rounding off PDB 
coordinates to the nearest A.  You can try it with this one-line shell 
command:


cat refined.pdb |\
awk '! /^ATOM|^HETAT/{print;next}\
  {X=substr($0,31,8);Y=substr($0,39,8);Z=substr($0,47,8);\
   pre=substr($0,1,30);post=substr($0,55)}\
  {X=sprintf("%.0f",X);Y=sprintf("%.0f",Y);Z=sprintf("%.0f",Z)}\
  {printf("%s%8.3f%8.3f%8.3f%s\n",pre,X,Y,Z,post)}' |\
cat > roundoff.pdb

  These rounded-off structures look ... weird. And yes they really do 
crash validation programs.  Food for thought perhaps on what 
"resolution", rmsd, and especially GDT_TS really mean?


-James Holton
MAD Scientist


On 4/1/2023 1:28 PM, Sweet, Robert wrote:

Knowing the author as I do, I checked the date and time, and wasn't fooled.


From: CCP4 bulletin board  on behalf of Carter, 
Charlie
Sent: Saturday, April 1, 2023 4:06 PM
To:CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] new PDB file format

I fell for this momentarily, hence compliments to James. I was fooled by the 
absolutely sensible intro.

Charlie


On Apr 1, 2023, at 12:34 AM, James Holton  wrote:

Anyone who has ever had to lecture a student for writing their unit cell 
lengths to dozens of decimal places is going to love the new PDB format.  It is 
more compact, more realistic, and less misleading to the poor, downstream 
consumers of structural data.

Only a few structures in the PDB are better than 1.0 A, and none come even 
close to 0.1 A.  Nevertheless, the classic PDB file format always listed atomic 
coordinates to three decimal places!  That's implying a precision of 0.001 A, 
which is not supported by the resolution of the data.  At long last, this 
age-old error is being corrected.  From now on, coordinates will be listed to 
the nearest Angstrom only.

An unexpected consequence of this is that R-free of a typical structure is 
going to rise from the current ~20% to well into the 40%s.  This is, however, 
more consistent with high-impact structures published in big-named journals 
using modern, better data collection methods like XFELs and CryoEM, so we are 
going to call this an improvement.  Besides, R factors are just cosmetic anyway.

Updated molprobity scores are not yet available while the authors fix bugs in 
their programs.  Right now, they return errors with the new, improved 
coordinates, such as:
line 272: 57012 Segmentation fault  (core dumped)

So, just as we all must adapt to Python 3 this new standard I'm sure will earn 
us all the thanks of future generations. They will no doubt be very grateful 
that we took these pains to protect them from the dangers of too many decimal 
places.

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://urldefense.com/v3/__https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqN5MCD8m$

This message was issued to members 
ofhttps://urldefense.com/v3/__http://www.jiscmail.ac.uk/CCP4BB__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqMmvOHNo$
  , a mailing list hosted 
byhttps://urldefense.com/v3/__http://www.jiscmail.ac.uk__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqCaWosgI$
  , terms & conditions are available 
athttps://urldefense.com/v3/__https://www.jiscmail.ac.uk/policyandsecurity/__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqBmvnFst$



To unsubscribe from the CCP4BB list, click the following link:
https://urldefense.com/v3/__https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqN5MCD8m$

This message was issued to members 
ofhttps://urldefense.com/v3/__http://www.jiscmail.ac.uk/CCP4BB__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqMmvOHNo$
  , a mailing list hosted 
byhttps://urldefense.com/v3/__http://www.jiscmail.ac.uk__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqCaWosgI$
  , terms & conditions are available 
athttps://urldefense.com/v3/__https://www.jiscmail.ac.uk/policyandsecurity/__;!!P4SdNyxKAPE!ECaCTTUpkh0GiKsgPTrUPapiNQyfraviFZihjo4vAPSbqkc4WhS0JVEl4ifyHotrj2BBLf-fqBmvnFst$



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message

[ccp4bb] new PDB file format

2023-03-31 Thread James Holton
Anyone who has ever had to lecture a student for writing their unit cell 
lengths to dozens of decimal places is going to love the new PDB 
format.  It is more compact, more realistic, and less misleading to the 
poor, downstream consumers of structural data.


Only a few structures in the PDB are better than 1.0 A, and none come 
even close to 0.1 A.  Nevertheless, the classic PDB file format always 
listed atomic coordinates to three decimal places!  That's implying a 
precision of 0.001 A, which is not supported by the resolution of the 
data.  At long last, this age-old error is being corrected.  From now 
on, coordinates will be listed to the nearest Angstrom only.


An unexpected consequence of this is that R-free of a typical structure 
is going to rise from the current ~20% to well into the 40%s.  This is, 
however, more consistent with high-impact structures published in 
big-named journals using modern, better data collection methods like 
XFELs and CryoEM, so we are going to call this an improvement.  Besides, 
R factors are just cosmetic anyway.


Updated molprobity scores are not yet available while the authors fix 
bugs in their programs.  Right now, they return errors with the new, 
improved coordinates, such as:

line 272: 57012 Segmentation fault  (core dumped)

So, just as we all must adapt to Python 3 this new standard I'm sure 
will earn us all the thanks of future generations. They will no doubt be 
very grateful that we took these pains to protect them from the dangers 
of too many decimal places.


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] To Trim or Not to To Trim

2023-03-31 Thread James Holton
I have never met a computational chemist who did not "notice" when a 
side chain is modeled as more than one conformer.


-James Holton
MAD Scientist

On 3/31/2023 8:31 AM, Olga Moroz wrote:

Hi everyone,

I always thought it is better to truncate so that biologists looking 
at the structures are not misled?

Not sure it the best aproach though...

Olga



On 10 Mar 2023, at 02:45, Bernhard Lechtenberg 
<968307750321-dmarc-requ...@jiscmail.ac.uk> wrote:


Hi Rhys,
I am also all for leaving side chains and letting the B-factors deal 
with the weak/absent density.
I don’t think there is a consensus, but I kind of remember that 
somebody did a poll a few years ago and if I remember correctly the 
main approaches were the one described above, or trimming the 
side-chains.

Bernhard
*Bernhard C. Lechtenberg*PhD
NHMRC Emerging Leadership Fellow
Laboratory Head
Ubiquitin Signalling Division​​
elechtenber...@wehi.edu.au <mailto:lechtenber...@wehi.edu.au>
T +61 3 9345 2217

*From:*CCP4 bulletin board  on behalf of Rhys 
Grinter <22087c81e8c6-dmarc-requ...@jiscmail.ac.uk>

*Date:*Friday, 10 March 2023 at 12:26 pm
*To:*CCP4BB@JISCMAIL.AC.UK 
*Subject:*[ccp4bb] To Trim or Not to To Trim

Hi All,
I'm trying to crowdsource an opinion on how people deal with 
modelling side chains with poorly resolved electron or cryoEM density.
My preference is to model the sidechain and allow the B-factors to go 
high in refinement to represent that the side chain is flexible. 
However, I'm aware that some people truncate sidechains if density is 
not present to justify modelling. I've also seen models where the 
sidechain is modelled but with zero occupancy if density isn't present.
Is there a consensus and justifying arguments for why one approach is 
better?

Cheers,
Rhys


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] To Trim or Not to To Trim

2023-03-30 Thread James Holton

Follow-up:

My refinements of all possible combinations rotamers are complete. 
Results are remarkably flat!  Be it 10 confomers, 2 conformers, or 
anything in between the correctness of the result, as indicated by 
CCtrue, are all very similar: 0.972 +/- 0.001.  Table below. I define 
CCtrue here as the CC between the ground-truth map and the 2mFo-DFc map 
from a refmac refinement run with default bulk solvent settings.  The 
ground truth is a simulated 7-residue peptide with 10 equally-populated 
whole-molecule conformations, each having ideal geometry and a different 
rotamer for the one Lys side chain.


The biggest drop is between 2 and 1 conformers (case 9 vs 10). It is 23x 
larger than the variation among the first 9 rows. This drop comes from 
the main chain, not the side chain splitting. If the main chain has only 
one conformer, even with all ten side chain rotamers present (case 11), 
the CCtrue is actually worse than having just a stump (12).  
Multi-conformer stumps (13, 14) also fare better than any run with a 
single-conformer main chain (10,11,12), no matter how many side chain 
confs are included. Anisotropic B refinement of one full main& side 
conformer (15) does not do as well as a 2-conf main-chain-only stump 
(13). This is in spite of having a whopping 7-sigma difference peak 
telling you the side chain is missing.


case  CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
1 0.9699 7.45   11.98  3.1 10 conformers side & main 
together
2 0.9722 6.93   10.60  3.4 9 confs (best of 10 combos)
3 0.9726 6.548.87  3.7 8 confs (best of 45)
4 0.9737 6.548.89  3.7 7 confs (best of 120)
5 0.9736 6.547.88  3.9 6 confs (best of 210)
6 0.9714 7.21   10.60  4.6 5 confs (best of 252)
7 0.9720 6.819.07  4.7 4 confs (best of 210)
8 0.9714 6.749.90  3.5 3 confs (best of 120 combos)
9 0.9720 7.36   10.14  6.8 2 conformers (best of 45 combos)
100.947110.35   15.04  5.2 1 conformer (best of 10 choices)
110.9476 9.66   13.48  4.6 10 confs side, 1 conf main chain
120.9534 9.71   12.12  6.9 stump, 1 conf (best of 10)
130.9665 9.15   11.79  7.0 stump, 2 confs (best of 45)
140.9676 8.75   13.17  6.8 stump, 10 confs
150.9534 8.78   10.82  6.5 stump, 1 conf anisotropic


Now, in my ground-truth model here I did shift the 10 conformers 
randomly relative to each other by rms 0.2 A.  This was to simulate the 
fact that atoms bonded to one another don't really vibrate 
independently. As Helen pointed out earlier, the multi-conformer nature 
of the main chain really does play a role.  Have you ever seen green and 
red blobs on your main chain?  This might be why.


Lijun suggested that perhaps a much larger source of error coming from 
somewhere else could mask this effect, and that is always true.  
However, because errors add in quadrature if the biggest difference 
feature in your map is a side chain you haven't built, then all other 
considerations are secondary. What I hope is apparent here, is the 
unfortunate situation that none of the currently common practices 
actually work very well, even under ideal conditions.


If anyone would like to try different parameters, here is a copy of each 
of these best refinements, as well as a script for generating them all 
from scratch:

https://bl831.als.lbl.gov/~jamesh/trimnotrim/testdata.tgz

-James Holton
MAD Scientist


On 3/19/2023 10:02 PM, Lijun Liu wrote:


Hi James:


First of all, I think all those trim-or-not-to-trim practices are kind 
of compromises when the data did not really offer local density strong 
enough to model side chain reliably. So many smart pioneers could not 
make a simple agreement, which means this science of art has the 
personality-dependent tribute too. Depending on how to interpret(model 
builder) and how to understand (reader), I am ok with a) or c) in your 
list and personally dislike the zero occupancy treatment (sorry if 
this may annoying many ones). The reason is that if a “side chain 
conformation” is modeled with a 0 occupancy, the reading of the 
information turns to be that all other conformations are possible 
(occupancies sums to 1.0) but this modeled one (normally what 
preferred?) which is simply the impossible one for its 0 occupancy and 
particularly mentioned —— logically a paradox.



Your test modeling and refinement of a lysine is interesting, but the 
experiment may not be flawless when trying to compare those different 
strategies. The experiment may be based on a dataset that really 
offers side chain modeling difficulty for its very weak densities 
around the expected location. The huge difference of CCtrue  in your 
experiment probably just means the model’s completeness (4 out of 1

Re: [ccp4bb] Attention APS users - where will you go?

2023-03-28 Thread James Holton

It is time to revive this old thread...

Although the Advanced Photon Source (Chicago) will be going down for a 
year or so starting Apr 17, the Advanced Light Source (Berkeley) will 
continue to operate ~50% of the MX and SAXS beam pipes that continue 
delivering light in the USA. Made possible by generous support from an 
NIH NIGMS P30, we of the "ALS-ENABLE" program are pleased to announce an 
upcoming webinar on April 10.


This webinar will cover capabilities at the beamlines in macromolecular 
crystallography, small angle X-ray scattering, and X-ray footprinting, 
as well as information on how to apply for beamtime. For more 
information, see the announcement here 
<http://als-enable.lbl.gov/wordpress/2023/03/14/als-enable-webinar-april-10th-2023/>. 



http://als-enable.lbl.gov/wordpress/2023/03/14/als-enable-webinar-april-10th-2023/

Hope to see you there,

-James Holton
MAD Scientist


On 6/13/2022 8:16 AM, James Holton wrote:
Thank you everyone who responded to my little poll.  To summarize and 
paraphrase, most common response was:

"I haven't really thought about it."

A distant second place was:
"I don't collect data at APS, so I'll be fine. (aka I'm not worried 
about all those APS users out-competing me for time at my favorite 
beamline)



and 3rd/4th place:
"I will go 'somewhere else' "
and/or
"we have an old rotating anode we can dust off."

Noteworthy responses I did NOT get were:
"I will just use cryoEM instead for a year or two"
nor
"I will just use AlphaFold "

With all due respect to the amazing recent advances in those fields, 
it would appear X-rays still play an important role in structural 
science, and a year of no data doesn't seem to be an option for most 
labs.


However, it would appear there is not much concern in the community.  
Personally, I wonder if that is justified. From what I can tell 
looking at public-facing calendars, most MX beamlines are being used 
about 80% of the time, and the APS represents at least half of total 
capacity in the USA. So, in April, I expect demand will rise to ~160% 
of supply. That means ~60% of beam time request proposals will get 
turned down.


To try and help illustrate, we at ALS have been pasting together a 
master calendar we call the "fly chromosome chart" here:

https://als-enable.lbl.gov/wordpress/2022/05/19/dark-period/

The width of the bars is proportional to the number of beamlines 
available.  Yes, they vary widely in flux and other capabilities, but 
assignment of beam time is usually done in "shifts".  Now, try to 
picture next year when the "APS" bar is all black.  Also, what kind of 
pins and pucks do you use? For many beamlines you may have to buy 
different ones.


Looking forward to the June 21 APS/U town hall discussions, as well as 
the ACA's "Bridging the APS dark period" session.  We will definitely 
be discussing this at the Diffraction Methods GRC, which is July 
24-29, 2022.  Space is still available!


-James Holton
MAD Scientist



On 5/9/2022 3:12 PM, James Holton wrote:

Greetings all,

I was just thinking of taking a little poll. When the Advanced Photon 
Source at Argonne shuts down for the APS-U upgrade on April 17, 2023, 
it will take with it about 90,000 hours of X-ray beam time until well 
into 2024. So, if you are a routine user of APS, what are your 
plans?  Will you just stop collecting X-ray data for 12 months or so? 
Do you have a proposal lined up at another synchrotron? Is it in the 
USA? Europe? Asia? Or are you, like me, a big procrastinator and 
haven't really thought much about it?


Whatever it is, I'd like to hear from you. Either on- or off-list is 
fine. I expect this community will be interested in the digest.


-James Holton
MAD Scientist







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] To Trim or Not to To Trim

2023-03-28 Thread James Holton
ppen.


Very important: DO NOT LOOK AT THE MAPS that come out of this kind of 
refinement. Not directly. They will look all weird and distorted. You 
need to average over all the ASUs to recover interpretable 2Fo-Fc and 
Fo-Fc maps. The strictly correct way to do this is to cut out each ASU, 
re-orient and align these maps, and then average them all together. 
Fortunately, there is another way to do this that is much easier and faster:


7) Re-index the refinement output mtz file with "reindex h/2,k/2,l/2", 
and in the same run of "reindex" change your space group back to what it 
was in your normal refinement.  Do NOT run this mtz file through cad.  
Cad will throw out all but one ASU. In fact, don't open this re-indexed 
mtz file with any program other than coot or "fft". The reason is this 
mtz file still has P1 data. It is "overcomplete". Just like when cad 
performed the "outlim space 1" there is more than one ASU in the mtz.  
Perhaps it is a quirk in the fft algorithm, but this all-ASU averaging 
happens automatically if the input reflection data is "overcomplete".


8) finally, you might want to write a script for reversing the procedure 
for populating your supercell so that you can align all the members into 
a single ASU and look at them.



But wait!? Isn't this "over-fitting"?  No, it is not.  Over-fitting in 
general is when you drive your residual (aka Rwork) essentially to 
zero.  That doesn't happen here because the geometry term holds you 
back.  You might think that with enough copies in the ensemble it should 
be possible to fit any density with geometrically reasonable molecules, 
but that is not what happens in practice. This is actually quite 
remarkable!  So many "free parameters" and yet the tug-o-war between R 
factors and geometry remains.  Now, of course, the real molecules in the 
real crystal are simultaneously obeying the laws of chemistry and 
generating the diffraction patterns that we see, but I have never found 
a way to make a macromolecular model that does both.  Neither has 
anybody else, of course, but wouldn't it be cool if someone figured out how?


-James Holton
MAD Scientist

On 3/18/2023 6:35 PM, Oganesyan, Vaheh wrote:


Hi Ben,

All copies created by multiplying cell dimensions will act exactly 
same as the original one, mathematically exactly. Nick’s approach is 
better. Something similar to what Nick said was published around 
2002-2003. I was reviewing it. I did not understand then what the 
author was trying to achieve and kept thinking about it for few 
months. The author split model into 20 each with 5% occupancy. After 
refinement he got an ensemble that looked like NMR structures. I’m not 
sure, however, that adding that uncertainty will help answering any 
question.


Vaheh

*From:* CCP4 bulletin board  *On Behalf Of 
*benjamin bax

*Sent:* Saturday, March 18, 2023 5:07 PM
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] To Trim or Not to To Trim

Hi,
Probably a stupid question.
Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 
27 structures) and restrain well defined parts of structure to be 
‘identical’ ? To give you a more NMR like chemically sensible ensemble 
of structures?

Ben


> On 18 Mar 2023, at 12:04, Helen Ginn  wrote:
>
> Models for crystallography have two purposes: refinement and 
interpretation. Here these two purposes are in conflict. Neither case 
is handled well by either trim or not trim scenario, but trimming 
results in a deficit for refinement and not-trimming results in a 
deficit for interpretation.

>
> Our computational tools are not “fixed” in the same way that the 
standard amino acids are “fixed” or your government’s bureaucracy 
pathways are “fixed”. They are open for debate and for adjustments. 
This is a fine example where it may be more productive to discuss the 
options for making changes to the model itself or its representation, 
to better account for awkward situations such as these. Otherwise we 
are left figuring out the best imperfect way to use an imperfect tool 
(as all tools are, to varying degrees!), which isn’t satisfying for 
enough people, enough of the time.

>
> I now appreciate the hypocrisy in the argument “do not trim, but 
also don’t model disordered regions”, even though I’d be keen to avoid 
trimming. This discussion has therefore softened my own viewpoint.

>
> My refinement models (as implemented in Vagabond) do away with the 
concept of B factors precisely for the anguish it causes here, and 
refines a distribution of protein conformations which is sampled to 
generate an ensemble. By describing the conformations through the 
torsion angles that comprise the protein, modelling flexibility of a 
disordered lysine is comparatively trivial, and indeed modelling all 
possible conformations of a disordered loop becomes feasible. Lysines 
end up looking like a frayed end of a rope.

Re: [ccp4bb] To Trim or Not to To Trim

2023-03-19 Thread James Holton
4 6.1 two conformers (best of 45 combos)
0.9471 10.35 15.04 5.1 single conformer (best of 10 choices)

If I add one CG, the other two chi1 positions light up.  So, I tried 
building in all 10 true CG positions, and let the refinement decide what 
to do with them. The clear indication was that a CD should be added. 
After adding all the CDs, the difference peaks were weaker, but still 
indicating more atoms were needed.  Rwork and Rfree, however, tell the 
opposite story.  They get worse the more atoms you add.  CCtrue, on the 
other hand, was best when cutting everything after CG.  Why is that?  
Well, every time you add another atom you fill in the difference 
density, but then that atom pushes back the bulk solvent model that was 
filling in the density for the next atom.  The atom-to-solvent distance 
is roughly twice that of a covalent bond.  So again, square pegs and 
round holes.


Three conformers coming out as the winner may be because it is a 
selective process with a noisy score. In the ground truth there are 10 
conformers at equal occupancy, so no one triplet is really any better 
than any other. However, one has a density shape that fits better than 
other combos. My search over all possible quartets is still running.


But what if we got the solvent "right"?  Well, here is what that looks like:

CCtrue Rwork%  Rfree% fo-fc(sigma) description
0.9476 9.66 13.48 4.6 all atoms, all confs, refmac defaults
0.9696 6.15 8.88  3.7         all atoms, all confs, phenix.refine
0.9825     0.80    0.89  3.9         all atoms, all confs, true solvent
0.9824 0.92    1.26  7.3         true model, minus one H atom 
from ordered HIS side chain


You can see that the default solvent of phenix.refine fares better than 
refmac here, but since I generated the solvent with phenix refine it may 
have an unfair advantage. Nevertheless, providing the "true solvent" 
here is quite a striking drop in R factors.  This is not surprising 
since this was the last systematic error in this ground truth.  In all 
cases, I provided the true atomic positions at the start of refinement, 
so there was no confusion about strain-inducing local minima, such as 
which rotamer goes with which main chain shift.  And yes, you can 
provide arbitrary bulk solvent maps to refmac5 using the "Fpart" 
feature.  I've had good luck with real data using bulk density derived 
form MD simulations.


What is more, once the R factors are this low I can remove just one 
hydrogen atom and it comes back as a 7.3-sigma difference peak. This 
corresponds to the protonation state of that His.  This kind of 
sensitivity is really attractive if you are looking for low-lying 
features, such as partially-occupied ligands.  Some may pooh-pooh R 
factors as "cosmetic" features of structures, but they are, in fact, 
nothing more or less than the % error between your model and your data.  
This % error translates directly into the noise level of your map.  At 
20% error there is no hope whatsoever of seeing 1-electron changes. This 
is because hydrogen is only 17% of a carbon.  But 3-5% error, which is a 
typical experimental error in crystallographic data, anything bigger 
than one electron is clear.


-James Holton
MAD Scientist



On 3/18/2023 2:10 PM, Nicholas Pearce wrote:
Not stupid, but essentially the same as modelling alt confs, though 
would probably give more overfitting. Alt confs can easily be 
converted to an ensemble (if done properly…).


Thanks,
Nick

———

Nicholas Pearce
Assistant Professor in Bioinformatics & DDLS Fellow
Linköping University
Sweden


*From:* CCP4 bulletin board  on behalf of 
benjamin bax 

*Sent:* Saturday, March 18, 2023 10:07:26 PM
*To:* CCP4BB@JISCMAIL.AC.UK 
*Subject:* Re: [ccp4bb] To Trim or Not to To Trim
Hi,
Probably a stupid question.
Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 
27 structures) and restrain well defined parts of structure to be 
‘identical’ ? To give you a more NMR like chemically sensible ensemble 
of structures?

Ben


> On 18 Mar 2023, at 12:04, Helen Ginn  wrote:
>
> Models for crystallography have two purposes: refinement and 
interpretation. Here these two purposes are in conflict. Neither case 
is handled well by either trim or not trim scenario, but trimming 
results in a deficit for refinement and not-trimming results in a 
deficit for interpretation.

>
> Our computational tools are not “fixed” in the same way that the 
standard amino acids are “fixed” or your government’s bureaucracy 
pathways are “fixed”. They are open for debate and for adjustments. 
This is a fine example where it may be more productive to discuss the 
options for making changes to the model itself or its representation, 
to better account for awkward situations such as these. Otherwise we 
are left figuring out the best imperfect way to use an imperfect to

Re: [ccp4bb] binding pockets...

2023-01-03 Thread James Holton

PanDDA ?

On 1/3/2023 7:14 AM, Harry Powell wrote:

Hi folks

I was wondering what people’s favourite program is to find binding pockets in 
proteins. I’ve had a look at a couple but each has its own idiosyncrasies.

HNY

Harry


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Future Diffraction Methods

2022-12-16 Thread James Holton
I want to thank everyone who attended the 2022 Gordon Research 
Conference and Gordon Research Seminar on Diffraction Methods in 
Structural Biology, as well as all those who contributed to these great 
gatherings in the past.  It was an outstanding meeting if I do say so 
myself. Not just because it had been so long without in-person 
interaction, not just because we had zero covid cases (which I see as no 
small feat of Mind over Virus), but because of this amazing community. 
It is rare in this world to have such a strong spirit of collaboration, 
camaraderie and openness in undertakings as high-impact as this. 
Surmounting the barriers to atomic-detail imaging of biological systems 
has never been more exciting and more relevant.  I am proud to be a part 
of it, and honored to have served as Chair.


It is therefore with heavy heart that I report to this community that I 
was the last Chair of the Diffraction Methods GRC.


The GRC Conference Evaluation Committee 
(https://www.grc.org/about/conference-evaluation-committee/) voted this 
year to discontinue the Diffraction Methods GRC and GRS. This ends a 
46-year tradition that I feel played a vital, and vibrant role in the 
work of the people who answer questions on this BB.  The reason given 
was insufficient attendance.  All other metrics, such as evaluation 
surveys and demographics were very strong. I have tried to appeal, but 
I'm told the vote was unanimous and final. I understand that like so 
many conference organizing bodies the GRC is having to make tough 
financial decisions. I must say I disagree with this one, but it was not 
my decision to make.


Many of the past and elected Chairs have been gathering and discussing 
how to replace the Diffraction Methods GRC/GRS going forward. Many great 
ideas, advice and perspectives have been provided, but that is a select 
group. I feel it is now time to open up this discussion to the broader 
community of structural methods developers and practitioners. There are 
some important questions to ask:


* How do we define this community?
        Yes, many of us do cryoEM too, but is that one methods meeting? 
or two?

* Does this community need a new diffraction methods meeting?
        As in one meeting or zero?
* Should we merge with an existing meeting?
    It would make logistics easier, but a typical GRC has 22 hours 
of in-depth presentations over 5 days.  The GRS is 7 hours over 2 days. 
As Chair, I found that was not nearly enough.

* Where do you think structural methods are going?
        I think I know, but I may be biased.
* Should the name change?
        From 1976 to 2000, it was "Diffraction Methods in Molecular 
Biology". The word "diffraction", BTW, comes from the Latin for 
"shattering of rays", and originally used to describe the iridescence of 
bird feathers. That's spectroscopy!

How about:
 "Structural Methods for the Departing of Rays"

I'm sure there are many more questions, and better suggestions.  I look 
forward to enlightening discussions!  GRCs have always been about 
discussion, and I hope to keep that tradition alive in this community.


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] xds crashes

2022-11-28 Thread James Holton
Sounds like your kernel might think "forkxds" is creating a "fork 
bomb".  Too many sub-processes firing off in too short a time triggers 
this.  On one of my systems, I fixed it by editing 
/etc/security/limits.d/20-nproc.conf so that users are allowed ~10x more 
pids than normal.  This tends to prevent these mysterious "Killed" errors.


HTH?

-James Holton
MAD Scientist


On 11/28/2022 7:44 AM, Demetres D. Leonidas wrote:

Dear all,

I do not know if this is the right list and I would like to apologize 
if it is not.


We have repeatedly experienced xds crashes in machines running ubuntu 
22.04.1 with the following message repeated several times at the 
INTEGRATE step


/usr/local/bin/forkxds: line 60:  3427 Done echo "$itask"

  3428 Killed  | $amain

We are trying to process data from P13 at EMBL-Hamburg.

We are running XDS version Jan 10, 2022 BUILT=20220820

Any ideas ?

Demetres





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] outliers

2022-11-08 Thread James Holton

Thank you for this.

Hmmm.

Interesting, and good to know the expected distribution of extreme values.

However, what I'm more worried about is how to evaluate the other 999 
points?  Lets say I'm trying to compare two 1000-member sets (A and B) 
that both have an extreme value of 3, but for the other 999 they are all 
2sigma in "A" and 1sigma in B.  Clearly, "B" is better than "A", but how 
to quantify?



On 11/8/2022 3:34 PM, Petrus Zwart wrote:

Hi James,

This is what you need.

https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution

The distribution of a maximum of 1k random variates looks like this, 
and the (fitted by eye) analytical distribution associated with it 
seems to have a decent fit - as expected.


image.png

The idea of a p-value to judge the quality of a structure is 
interesting. xtriage uses this mechanism to flag suspicious normalized 
intensities, the idea being that in a small dataset it is less likely 
to see a large E value as compared to in a large dataset.
The issue of course is that the total intensity of a normalized 
intensity is bound by the number of atoms and the underlying 
assumption used is that it can be potentially infinitely large. It 
still is a decent metric I think.


P


P


On Tue, Nov 8, 2022 at 3:25 PM James Holton  wrote:

Thank you Ian for your quick response!

I suppose what I'm really trying to do is put a p-value on the
"geometry" of a given PDB file.  As in: what are the odds the
deviations from ideality of this model are due to chance?

I am leaning toward the need to take all the deviations in the
structure together as a set, but, as Joao just noted, that it just
"feels wrong" to tolerate a 3-sigma deviate. Even more wrong to
tolerate 4 sigma, 5 sigma. And 6 sigma deviates are really
difficult to swallow unless your have trillions of data points.

To put it down in equations, is the p-value of a structure with
1000 bonds in it with one 3-sigma deviate given by:

a)  p = 1-erf(3/sqrt(2))
or
b)  p = 1-erf(3/sqrt(2))**1000
or
c) something else?



On 11/8/2022 2:56 PM, Ian Tickle wrote:

Hi James

I don't think it's meaningful to ask whether the deviation of a
single bond length (or anything else that's single) from its
expected value is significant, since as you say there's always
some finite probability that it occurred purely by chance. 
Statistics can only meaningfully be applied to samples of a
'reasonable' size.  I know there are statistics designed for
small samples but not for samples of size 1 !  It's more
meaningful to talk about distributions.  For example if 1% of the
sample contained deviations > 3 sigma when you expected there to
be only 0.3 %, that is probably significant (but it still has a
finite probability of occurring by chance), as would be finding
no deviations > 3 sigma (for a reasonably large sample to avoid
sampling errors).

Cheers

-- Ian


On Tue, Nov 8, 2022, 22:22 James Holton  wrote:

OK, so lets suppose there is this bond in your structure that is
stretched a bit.  Is that for real? Or just a random fluke? 
Let's say
for example its a CA-CB bond that is supposed to be 1.529 A
long, but in
your model its 1.579 A.  This is 0.05 A too long. Doesn't
seem like
much, right? But the "sigma" given to such a bond in our
geometry
libraries is 0.016 A.  These sigmas are typically derived from a
database of observed bonds of similar type found in highly
accurate
structures, like small molecules. So, that makes this a
3-sigma outlier.
Assuming the distribution of deviations is Gaussian, that's a
pretty
unlikely thing to happen. You expect 3-sigma deviates to
appear less
than 0.3% of the time.  So, is that significant?

But, then again, there are lots of other bonds in the
structure. Lets
say there are 1000. With that many samplings from a Gaussian
distribution you generally expect to see a 3-sigma deviate at
least
once.  That is, do an "experiment" where you pick 1000
Gaussian-random
numbers from a distribution with a standard deviation of 1.0.
Then, look
for the maximum over all 1000 trials. Is that one > 3 sigma?
It probably
is. If you do this "experiment" millions of times it turns
out seeing at
least one 3-sigma deviate in 1000 tries is very common.
Specifically,
about 93% of the time. It is rare indeed to have every member
of a
1000-deviate set all lie within 3 sigmas.  So, we have gone
from one
3-sigma deviate being highly unlikely to being a virtual
certainty if

Re: [ccp4bb] outliers

2022-11-08 Thread James Holton

Thank you Ian for your quick response!

I suppose what I'm really trying to do is put a p-value on the 
"geometry" of a given PDB file.  As in: what are the odds the deviations 
from ideality of this model are due to chance?


I am leaning toward the need to take all the deviations in the structure 
together as a set, but, as Joao just noted, that it just "feels wrong" 
to tolerate a 3-sigma deviate.  Even more wrong to tolerate 4 sigma, 5 
sigma. And 6 sigma deviates are really difficult to swallow unless your 
have trillions of data points.


To put it down in equations, is the p-value of a structure with 1000 
bonds in it with one 3-sigma deviate given by:


a)  p = 1-erf(3/sqrt(2))
or
b)  p = 1-erf(3/sqrt(2))**1000
or
c) something else?



On 11/8/2022 2:56 PM, Ian Tickle wrote:

Hi James

I don't think it's meaningful to ask whether the deviation of a single 
bond length (or anything else that's single) from its expected value 
is significant, since as you say there's always some finite 
probability that it occurred purely by chance.  Statistics can only 
meaningfully be applied to samples of a 'reasonable' size.  I know 
there are statistics designed for small samples but not for samples of 
size 1 !  It's more meaningful to talk about distributions.  For 
example if 1% of the sample contained deviations > 3 sigma when you 
expected there to be only 0.3 %, that is probably significant (but it 
still has a finite probability of occurring by chance), as would be 
finding no deviations > 3 sigma (for a reasonably large sample to 
avoid sampling errors).


Cheers

-- Ian


On Tue, Nov 8, 2022, 22:22 James Holton  wrote:

OK, so lets suppose there is this bond in your structure that is
stretched a bit.  Is that for real? Or just a random fluke?  Let's
say
for example its a CA-CB bond that is supposed to be 1.529 A long,
but in
your model its 1.579 A.  This is 0.05 A too long. Doesn't seem like
much, right? But the "sigma" given to such a bond in our geometry
libraries is 0.016 A.  These sigmas are typically derived from a
database of observed bonds of similar type found in highly accurate
structures, like small molecules. So, that makes this a 3-sigma
outlier.
Assuming the distribution of deviations is Gaussian, that's a pretty
unlikely thing to happen. You expect 3-sigma deviates to appear less
than 0.3% of the time.  So, is that significant?

But, then again, there are lots of other bonds in the structure. Lets
say there are 1000. With that many samplings from a Gaussian
distribution you generally expect to see a 3-sigma deviate at least
once.  That is, do an "experiment" where you pick 1000
Gaussian-random
numbers from a distribution with a standard deviation of 1.0.
Then, look
for the maximum over all 1000 trials. Is that one > 3 sigma? It
probably
is. If you do this "experiment" millions of times it turns out
seeing at
least one 3-sigma deviate in 1000 tries is very common. Specifically,
about 93% of the time. It is rare indeed to have every member of a
1000-deviate set all lie within 3 sigmas.  So, we have gone from one
3-sigma deviate being highly unlikely to being a virtual certainty if
you look at enough samples.

So, my question is: is a 3-sigma deviate significant?  Is it
significant
only if you have one bond in the structure?  What about angles?
What if
you have 500 bonds and 500 angles?  Do they count as 1000 deviates
together? Or separately?

I'm sure the more mathematically inclined out there will have some
intelligent answers for the rest of us, however, if you are not a
mathematician, how about a vote?  Is a 3-sigma bond length deviation
significant? Or not?

    Looking forward to both kinds of responses,

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

This message was issued to members of www.jiscmail.ac.uk/CCP4BB
<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by
www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & conditions
are available at https://www.jiscmail.ac.uk/policyandsecurity/





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] outliers

2022-11-08 Thread James Holton
OK, so lets suppose there is this bond in your structure that is 
stretched a bit.  Is that for real? Or just a random fluke?  Let's say 
for example its a CA-CB bond that is supposed to be 1.529 A long, but in 
your model its 1.579 A.  This is 0.05 A too long. Doesn't seem like 
much, right? But the "sigma" given to such a bond in our geometry 
libraries is 0.016 A.  These sigmas are typically derived from a 
database of observed bonds of similar type found in highly accurate 
structures, like small molecules. So, that makes this a 3-sigma outlier. 
Assuming the distribution of deviations is Gaussian, that's a pretty 
unlikely thing to happen. You expect 3-sigma deviates to appear less 
than 0.3% of the time.  So, is that significant?


But, then again, there are lots of other bonds in the structure. Lets 
say there are 1000. With that many samplings from a Gaussian 
distribution you generally expect to see a 3-sigma deviate at least 
once.  That is, do an "experiment" where you pick 1000 Gaussian-random 
numbers from a distribution with a standard deviation of 1.0. Then, look 
for the maximum over all 1000 trials. Is that one > 3 sigma? It probably 
is. If you do this "experiment" millions of times it turns out seeing at 
least one 3-sigma deviate in 1000 tries is very common. Specifically, 
about 93% of the time. It is rare indeed to have every member of a 
1000-deviate set all lie within 3 sigmas.  So, we have gone from one 
3-sigma deviate being highly unlikely to being a virtual certainty if 
you look at enough samples.


So, my question is: is a 3-sigma deviate significant?  Is it significant 
only if you have one bond in the structure?  What about angles? What if 
you have 500 bonds and 500 angles?  Do they count as 1000 deviates 
together? Or separately?


I'm sure the more mathematically inclined out there will have some 
intelligent answers for the rest of us, however, if you are not a 
mathematician, how about a vote?  Is a 3-sigma bond length deviation 
significant? Or not?


Looking forward to both kinds of responses,

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Refmac5 occupancy group by residue name

2022-09-13 Thread James Holton

Maybe this will help?
https://bl831.als.lbl.gov/~jamesh/scripts/refmac_occupancy_setup.com

Make a pdb file of the residues you want to occupancy-refine and put it 
on the command line of this script, along with the word "allatoms".


This will generate a file called "refmac_opts_occ.txt" that you can 
import into your refmac run.  Possibly using the "@" keyword?


HTH

-James Holton
MAD Scientist


On 9/13/2022 6:59 AM, Evgenii Osipov wrote:

Dear CCP4 community,

  I am refining several structures of multimeric protein-ligand complexes and I 
wanted to refine occupancy of the ligand. Manual definition of groups would be 
tedious and error prone considering that ASU contains 10 protein chains and 1-8 
bound ligand molecules. Hence my idea was to define occupancy group for ligands 
using residue name, e.g. LIG.

According to manual page 
(http://www.ysbl.york.ac.uk/refmac/data/refmac_keywords.html) I could specify 
occupancy refinement group using chain, residue intervals, atom names and alt 
code. However, residue names are not mentioned inside “Occupancy refinement” 
paragraph.

Is it possible to define occupancy groups in refmac5 using residue name?


Kind regards,





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Quantifying electron density inside of a given volume

2022-08-25 Thread James Holton

Hey Pavel,

Thank you for your thoughtful comments and excellent references.

 I also just realized I made a mistake in an earlier message.  When you 
"integrate" a map using its average value you want to multiply by the 
volume of the asymmetric unit, not the volume of the unit cell. If you 
multiply by the cell volume you are integrating over the symmetry mates 
as well. You probably just want to know the number of electrons in one 
blob (one ASU).


One quick response to Pavel's first comment below too.

On 8/22/2022 9:48 PM, Pavel Afonine wrote:

Hi James,


- Where exactly inside the blob of density do you place these
dummy atoms?

Where? At the peaks.


Peaks? This means you need to have atomic resolution data and also 
blobs representing ordered atoms, so you actually have peaks!


Not so!  Unless the map is completely featureless there is always a 
highest point.  That is where you put the first atom. Once placed, you 
subtract (or otherwise remove) the density of that atom from the map. In 
this new, modified, map somewhere else is now the new highest point. 
This is where you put atom #2. Etc. With each iteration you transfer 
density from the map into a model. I've tried to adopt a strategy that 
keeps the total number of electrons (model + map) fixed, but that 
creates some interesting problems. The trick is that as you remove 
positive density you don't want to introduce negative density.  And by 
"negative" I mean dipping below the vacuum level. This vacuum level may 
seem arbitrary at first, but in the calculated map it is a very real 
thing that cannot be neglected. No amount of adjustment in xyz, 
occupancy or B factor can generate a negative peak in the calculated 
map. In a way, this positivity constraint is another reason why 
occupancy refinement is a good way to integrate density.


Truth be told, although my "divot" approach seems to work fairly well, 
I'm still not entirely happy with it. The general problem of finding a 
minimally complex constellation of atoms that explain a given blob I 
don't think is solved. But I imagine this would make for a good AI project?


Cheers,

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Quantifying electron density inside of a given volume

2022-08-16 Thread James Holton
se the default step 
size in refinement programs is too large for this "crowded atom" situation.
- I guess O or C as an atom type should do it, but what about B factor 
(would you refine B as well?)?
I find oxygen will do in peak-picking cases, but for grid atoms under 
very smooth density and closely-spaced atoms I have found the individual 
occupancies get rather small, the round-off error of occupancy from 0.01 
to 0.00 creates a granularity of ~0.1 electron, and this can start 
creating artificial noise in the fit.  For cases like this I have gone 
to "liquid helium", or modelling dummy atoms as He. Yes, you might think 
that hydrogen would be better, but H atoms have so many special checks 
and whiz-bangs for how they are treated I eventually gave up and went to 
He.  (One could also argue that at low enough temperature He atoms are 
allowed to "overlap" anyway. ;) )


Formally, it shouldn't matter (much) what the B factors refine to, since 
B does not affect the number of electrons in the atom. However, B 
factors most certainly do affect peak heights, and the "tails" can start 
to be lost when the B factor gets big.  This is because the program that 
generates the calculated map used to make Fcalc only plots each atom out 
to a certain radius.  I'm not sure what it is in phenix nor refmac, but 
in CCP4's sfall it is 4.5 A. Sfall doesn't support B>99 because atoms 
with higher B factors start to lose significant density beyond that 
radius. For B=500 you need a radius of at least 10 A. You'd think that 
the practice of deleting B=500 atoms or slowly lowering their 
occupancies would be a good strategy, but oddly enough I have not found 
that to be the case in practice.  Sometimes a big, flat atom is what the 
density wants. I have had some limited success with breaking up high-B 
atoms into a 2x2x2 grid of new atoms with lower B factors, and that 
seems to work fairly well. Exactly what threshold to use is still a good 
question.  SHELX, I believe, has a mechanism for splitting highly 
anisotropic B factors into two atoms, so there is precedence.
- if you refine B, how do you deconvolute occupancy from refined B 
values (and eventually from effects of positional errors of your DAs)?
For the grid-of-atoms approach I've found a fixed overall B factor and 
xyz positions at first is a good place to start. I then add B 
refinement, then xyz.  For a small blob this can be stable, but if you 
try to do this for the whole bulk solvent region, for example, it blows up.

-
- How all these choices are going to affect the result?
I always recommend doing multi-start refinement, trying a variety of 
strategies to see how much the result jumps around.  One might even want 
to compare integrated density to occupancy-refined recovered electron 
count?  There is the old adage that "a person with two thermocouples 
never knows what temperature it is", but personally, I'd rather know the 
error bars.


And the overall accuracy? I doubt it will ever be less than Rfree, which 
for macromolecules is still in the 20%s.  For now


Cheers!

-James Holton
MAD Scientist




On Mon, Aug 15, 2022 at 4:38 PM James Holton  wrote:

There are several programs for integrating electron density, but
please let me assure you that it is almost always the wrong thing
to do.

A much better strategy is occupancy refinement.  Throw in dummy
atoms, turn off non-bonded interactions to them, and refine their
occupancy until it a) stops changing (may be more than one round),
and b) there are no Fo-Fc differences left in the region of
interest.  Then all you do is add up the occupancies, multiply by
the relevant atomic number (usually 8), and voila! you get the
best-fit number of electrons in your blob. You may want to try
re-running with random starting points to get an idea of the error
bars.

What is wrong with integrating density?  Well, for one, it is hard
to know where to set the boundaries. Integrated density can be
VERY sensitive to the choice of radius, making your arbitrary
decision of which radius to use a source of error. Too small and
you miss stuff. Too big and you add unnecessary noise. Also,
neighboring atoms have tails, and if you don't subtract them
properly, that is another source of error. Also, because of the
missing F000 term, there is an offset, which adds a term
proportional to the integration volume.  For example, an integral
resulting in zero "electrons" does NOT mean you have vacuum. It
just means that the area you integrated has the same average
density as the entire map. This may not be the number you want.

The beauty of occupancy refinement is that it automatically
handles all these problems. The "vacuum level" and F000 are known
quantities in the calculated map. The B factors given to the dummy
atoms als o allow

Re: [ccp4bb] Quantifying electron density inside of a given volume

2022-08-15 Thread James Holton
l-ordered 
helix, extract those atoms, calculate a map using "sfall", and add it to 
the masked-off difference map before doing the "new Fobs" structure 
factor calculation. Include these same atoms in the new refinement. They 
won't move, but they will keep the scale stable. Oh, and don't forget to 
turn off the bulk solvent correction!  The bulk solvent has already been 
subtracted in the mFo-DFc map.


Hope this all makes sense and feel free to ask follow-up questions,

-James Holton
MAD Scientist


On 8/10/2022 9:59 AM, Neno Vuksanovic wrote:

Dear All,

I would like to quantify electron density inside of positive Fo-Fc 
blobs in active sites of multiple protomers in the map and compare 
them. I am aware that I can interpolate maps and obtain density values 
at coordinate points using either MapMan, Chimera or Coot, but I would 
like to know if there is a tool that would let me designate a sphere 
of a certain volume and calculate total electron density inside it?


Best Regards,
Neno



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Diffraction Methods in Structural Biology - Gordon Research Conference - in-person! - July 24-29, 2022

2022-07-01 Thread James Holton
Oh, and if you are concerned about virus transmission at in-person 
conferences, you are not alone! I would not be going forward with this 
if I didn't feel sufficient controls were in place. To date, I ahve 
heard of no reporteed cases of transmission at GRC meetings, so I feel 
they are doing their job. Vaccination is required, and 
masking/distancing rules are not draconian but sensible. Outdoor dining 
is available, and room and board are included with registration.


I will also be bringing my air quality meters (CO2 and particulate) for 
monitoring how well the air is being replaced and filtered. If you don't 
already own such devices, I highly recommend them. Bring them on the 
plane, and be sure to keep your mask tight during boarding and 
un-boarding when air filtration is minimal. Exhaled breath has 100x the 
CO2 of outside air, and so makes a pessimistic proxy for how much of the 
ambient air was recently in someone else's lungs. If CO2 is high, filter 
it. Do that with HEPA filters, a mask, or both. PM2.5 and other particle 
counters reading zero means your HEPA filters are working. But, if CO2 
reads 400 ppm you are effectively outside.  We are scientists. We know 
how to do this.


The official GRC COVID-19 policies are detailed here:
https://www.grc.org/covid-19-protocols-and-travel-information/

the latest venue policies (Bates College) can be found here:
https://www.grc.org/_resources/common/userfiles/file/Bates%20&%20GRC-COVID%20and%20Venue%20Information%20.pdf

Looking forward to a safe and productive GRC!

-James Holton
MAD Scientist

On 5/7/2022 10:11 AM, James Holton wrote:

One more thing:

Some may also recall that in 2020 we were accepting tax-deductible 
donations to help attendees from underrepresented groups overcome the 
financial barriers to GRC attendance. Those funds are still available, 
and donations are also still possible. I ask that applicants who feel 
they may qualify please self-identify to me, off-list, in an email. It 
is my goal to bring as many diverse backgrounds and points of view as 
possible into this meeting, because that is what makes for the most 
productive discussions.


-James Holton
MAD Scientist


On 5/2/2022 12:35 PM, James Holton wrote:


Many of you may recall approximately 1000 years ago we were looking 
forward to getting together for another great Diffraction Methods 
GRC. Now, after a 4-year break, the meeting is on!
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2022/ 



It will be in-person at Bates College in Lewiston, ME, USA, on July 
24-29 of 2022. Strange how it is strange to be considering meeting in 
person, but recent GRCs have proven they can be conducted safely. 
We've learned a lot about viruses in recent years, both in our lives 
and in our labs. Artificial Intelligence has come a long way, and the 
role of biological structure, and indeed science in general, is 
impacting the everyday lives of human beings more than ever before.


It is time we got together to talk about all this. Yes, we've gotten 
a lot of work done remotely, but some things just have to wait until 
you are face-to-face. Preferably over a Maine lobster dinner. GRCs 
are not about listening to talks, they are about the discussion that 
comes after. Newcomers and Veterans sharing and debating ideas until 
far too late at night. It is my sincere hope that fighting this 
virus, and looking toward a brighter future, will inspire even more 
visionary and collaborative ideas for the role structure will play in 
that future. I can't imagine a better theme of discussion for this 
next meeting.


-James Holton
MAD Scientist and Chair of the 2020/2022 Diffraction Methods GRC






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Diffraction Methods in Structural Biology - Gordon Research Conference - in-person! - July 24-29, 2022

2022-07-01 Thread James Holton
OK folks!  Now is the time. Registration deadline for the Diffraction 
Methods GRC is tomorrow, July 3.


Please if you haven't registered yet but have only been accepted as an 
attendee, you need to actually pay registration or your application will 
be cancelled on July 5.  I'd hate to miss you!


If you need help paying for registration please do reach out to me 
off-list and I will let you know what kinds of assistance are 
available.  In some cases a financial need itself can qualify.


Cheers, and hope to see you there,

-James Holton
MAD Scientist


On 5/3/2022 7:35 AM, James Holton wrote:


Many of you may recall approximately 1000 years ago we were looking 
forward to getting together for another great Diffraction Methods GRC. 
Now, after a 4-year break, the meeting is on!
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2022/ 



It will be in-person at Bates College in Lewiston, ME, USA, on July 
24-29 of 2022. Strange how it is strange to be considering meeting in 
person, but recent GRCs have proven they can be conducted safely. 
We've learned a lot about viruses in recent years, both in our lives 
and in our labs. Artificial Intelligence has come a long way, and the 
role of biological structure, and indeed science in general, is 
impacting the everyday lives of human beings more than ever before.


It is time we got together to talk about all this. Yes, we've gotten a 
lot of work done remotely, but some things just have to wait until you 
are face-to-face. Preferably over a Maine lobster dinner. GRCs are not 
about listening to talks, they are about the discussion that comes 
after. Newcomers and Veterans sharing and debating ideas until far too 
late at night. It is my sincere hope that fighting this virus, and 
looking toward a brighter future, will inspire even more visionary and 
collaborative ideas for the role structure will play in that future. I 
can't imagine a better theme of discussion for this next meeting.


-James Holton
MAD Scientist and Chair of the 2020/2022 Diffraction Methods GRC





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] open review?

2022-06-27 Thread James Holton

Hey Debanu,

Hmm. Last time I did it I didn't have to go through any IP lawyers to 
upload a pre-print to biorxiv.  What I was thinking of is something 
similar to that.  Researchers, on their own, deciding to upload their 
applications and reviews.  What would be the motivation? Well, I imagine 
it is not an uncommon situation where you might want help from more than 
just the reviewers on how to revise your application. I know I always 
try to get all the help I can get.


Might even be able to use biorxiv to do it?  Or am I missing something?

-James Holton
MAD Scientist


On 6/27/2022 12:16 PM, Debanu Das wrote:
Thinking about it some more, I think all the materials (patentable IP 
or trade secrets, which in the US are IP and under Defense of Trade 
Secrets Act) of a researcher are owned by the university. So just 
getting across tech transfer/IP of individual univs would be a massive 
hurdle before thinking of being able to upload grants proposals for 
sharing.


And funding agencies would first also have to negotiate (and convince) 
with all univs to allow it, even if somehow taxpayers and funding 
agencies could be first convinced about the need or value in doing 
this. In fact, in that scenario, there would actually be no need for a 
new system to share proposals. All funding agencies just have to open 
up a portal to access submitted grants (and I'm quite sure the 
agencies already have massive security around hacking attempts to 
access all this material).


Cheers,
Debanu

On Mon, Jun 27, 2022 at 11:58 AM Debanu Das  wrote:

Dear John,

For sure it is an aspiration as a society and as a civilization:
to think beyond individual nations. And for that we have some
examples as you mentioned at the scientific (IUCr, PDB) and
political level (UN). We also have the EU, ASEAN, NATO, etc.

However, despite having these organizations, I think even within
most of them, for critical strategic information that dictates
competitiveness and preparation, sharing is restricted to within
the group (at least for the political ones). For that matter, even
individual agencies within countries often have restrictions in
data and materials sharing.

I think if we solve the issue of national competitiveness, social
inequality, etc first, we will not even have to discuss if there
could be issues openly and globally sharing grant proposals. I
guess the counter proposal could be made that maybe more sharing
of more information will eventually lead to equity everywhere
(which to some extent is reflected in the open sharing of
publications).

But for now, I think there are practicality hurdles to cross on
these, which is why I mentioned "workable" in my initial response.
Just in the last few years, we have seen examples of more and more
focus on IP theft, computer hacking to steal research data from
organizations and companies, more focus on ensuring
confidentiality of the peer review process, and computer security
to avoid leaks of material, and so on.

Not trying to be cynical here, I think it is great for us as a
community to always have an eye on a larger and nobler purpose
while working within current practicalities and frameworks.

Thank you.

Best regards,
Debanu

On Mon, Jun 27, 2022 at 11:18 AM John R Helliwell
 wrote:

Dear Debanu,
There is indeed much at stake here.
Would I do it now, share my proposals, No.
Would I do it if funders’ rules required it. Yes.
When might funders’ rules require it eg when Tax payers insist
that the priority is achieving societal goals asap. Might that
happen in the foreseeable future? I don’t think so because we
are as scientists good at thinking so far out of the box, such
as the internet, or from the 19the century electricity and
magnetism, the tax payer sees the benefit of an individual’s
curiosity driven research.
The bigger point is can we also think beyond individual nations?
We know we can: the UN, International Council for Science, IUCr……
So, it probably isn’t a one size fits all idea that James has
put forward…
Best wishes,
John



Emeritus Professor John R Helliwell DSc





On 27 Jun 2022, at 19:03, Debanu Das 
wrote:


>So, 2nd question is: would you do it? Would you upload your
application
>into the public domain for all to see? What about the
reviewer comments?
>If not, why not?  Afraid people will steal your ideas? Well,
once
>something is public, its pretty clear who got the idea first.

I do not think this ("upload your application into the public
domain for all to see") is a workable or desirable idea for a
variety of reasons. There are far greater issues

Re: [ccp4bb] arginine sidechain planarity issues

2022-06-27 Thread James Holton

It's the library:
https://doi.org/10.1107/S2059798320013534

And it is not something that is terribly easy to fix. Arginine just has 
internal clashes so that plane is only flat sometimes. This is a lot of 
what led to the "Conformation Dependent Library", which I don't think is 
set up under CCP4.


What I do, is copy $CLIBD/monomers/a/ARG.cif, edit it to increase the 
sigmas of plan-2 and the NH1-CZ-NH2 angle, and also edit the other two 
guanidino angles as described in the above paper. Provide this edited 
cif file as if it were a ligand to your refinement. It helps. Not as 
good as using CDL, but it helps.


-James Holton
MAD Scientist

On 6/27/2022 8:38 AM, SUBSCRIBE CCP4BB rkrishnan wrote:

I am refining a structure at 2.2A resolution. I find that all my arginines show 
distortion after refinement with refmac5 even though they fit into  the density 
really well.
This is latest version of ccp4 8.00 running on windows 10.  Is my library 
messed up in refmac5? Any other ideas?



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] open review?

2022-06-22 Thread James Holton

Greetings all,

I'd like to ask a question that I expect might generate some spirited 
discussion.


We have seen recently a groundswell of support for openness and 
transparency in peer review. Not only are pre-prints popular, but we are 
also seeing reviewer comments getting published along with the papers 
themselves. Sometimes even signed by the reviewers, who would have 
traditionally remained anonymous.


My question is: why don't we also do this for grant proposals?

I know this is not the norm. However, after thinking about it, why 
wouldn't we want the process of how funding is awarded in science to be 
at least as transparent as the process of publishing the results? Not 
that the current process isn't transparent, but it could be more so. 
What if applications, and their reviewer comments, were made public? 
Perhaps after an embargo period?  There could be great benefits here. 
New investigators especially, would have a much clearer picture of 
format, audience, context and convention. I expect unsuccessful 
applications might be even more valuable than successful ones. And yet, 
in reality, those old proposals and especially the comments almost never 
see the light of day. Monumental amounts of work goes into them, on both 
sides, but then get tucked away into the darkest corners of our hard drives.


So, 2nd question is: would you do it? Would you upload your application 
into the public domain for all to see? What about the reviewer comments? 
If not, why not?  Afraid people will steal your ideas? Well, once 
something is public, its pretty clear who got the idea first.


3rd question: what if the service were semi-private? and you got to get 
comments on your proposal before submitting it to your funding agency? 
Would that be helpful? What if in exchange for that service you had to 
review 2-3 other applications?  Would that be worth it?


Or, perhaps, I'm being far too naiive about all this. For all I know 
there are some rules against doing this I'm not aware of.  Either way, 
I'm interested in what this community thinks. Please share your views!  
On- or off-list is fine.


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Attention APS users - where will you go?

2022-06-13 Thread James Holton
Thank you everyone who responded to my little poll.  To summarize and 
paraphrase, most common response was:

"I haven't really thought about it."

A distant second place was:
"I don't collect data at APS, so I'll be fine. (aka I'm not worried 
about all those APS users out-competing me for time at my favorite beamline)



and 3rd/4th place:
"I will go 'somewhere else' "
and/or
"we have an old rotating anode we can dust off."

Noteworthy responses I did NOT get were:
"I will just use cryoEM instead for a year or two"
nor
"I will just use AlphaFold "

With all due respect to the amazing recent advances in those fields, it 
would appear X-rays still play an important role in structural science, 
and a year of no data doesn't seem to be an option for most labs.


However, it would appear there is not much concern in the community.  
Personally, I wonder if that is justified. From what I can tell looking 
at public-facing calendars, most MX beamlines are being used about 80% 
of the time, and the APS represents at least half of total capacity in 
the USA. So, in April, I expect demand will rise to ~160% of supply. 
That means ~60% of beam time request proposals will get turned down.


To try and help illustrate, we at ALS have been pasting together a 
master calendar we call the "fly chromosome chart" here:

https://als-enable.lbl.gov/wordpress/2022/05/19/dark-period/

The width of the bars is proportional to the number of beamlines 
available.  Yes, they vary widely in flux and other capabilities, but 
assignment of beam time is usually done in "shifts".  Now, try to 
picture next year when the "APS" bar is all black.  Also, what kind of 
pins and pucks do you use? For many beamlines you may have to buy 
different ones.


Looking forward to the June 21 APS/U town hall discussions, as well as 
the ACA's "Bridging the APS dark period" session.  We will definitely be 
discussing this at the Diffraction Methods GRC, which is July 24-29, 
2022.  Space is still available!


-James Holton
MAD Scientist



On 5/9/2022 3:12 PM, James Holton wrote:

Greetings all,

I was just thinking of taking a little poll. When the Advanced Photon 
Source at Argonne shuts down for the APS-U upgrade on April 17, 2023, 
it will take with it about 90,000 hours of X-ray beam time until well 
into 2024. So, if you are a routine user of APS, what are your plans?  
Will you just stop collecting X-ray data for 12 months or so? Do you 
have a proposal lined up at another synchrotron? Is it in the USA? 
Europe? Asia? Or are you, like me, a big procrastinator and haven't 
really thought much about it?


Whatever it is, I'd like to hear from you. Either on- or off-list is 
fine. I expect this community will be interested in the digest.


-James Holton
MAD Scientist





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Dave Case gives John Lawrence Seminar- starting now!

2022-05-10 Thread James Holton

starting now: Dave Case on simulating macromolecular crystals with AMBER :

https://lbnl.zoom.us/j/99255547757?pwd=TWozZm9NNDBZZFpIZER6U0JmMmYvQT09



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Attention APS users - where will you go?

2022-05-09 Thread James Holton

Greetings all,

I was just thinking of taking a little poll. When the Advanced Photon 
Source at Argonne shuts down for the APS-U upgrade on April 17, 2023, it 
will take with it about 90,000 hours of X-ray beam time until well into 
2024. So, if you are a routine user of APS, what are your plans?  Will 
you just stop collecting X-ray data for 12 months or so? Do you have a 
proposal lined up at another synchrotron? Is it in the USA? Europe? 
Asia? Or are you, like me, a big procrastinator and haven't really 
thought much about it?


Whatever it is, I'd like to hear from you. Either on- or off-list is 
fine. I expect this community will be interested in the digest.


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Diffraction Methods in Structural Biology - Gordon Research Conference - in-person! - July 24-29, 2022

2022-05-07 Thread James Holton

One more thing:

Some may also recall that in 2020 we were accepting tax-deductible 
donations to help attendees from underrepresented groups overcome the 
financial barriers to GRC attendance. Those funds are still available, 
and donations are also still possible. I ask that applicants who feel 
they may qualify please self-identify to me, off-list, in an email. It 
is my goal to bring as many diverse backgrounds and points of view as 
possible into this meeting, because that is what makes for the most 
productive discussions.


-James Holton
MAD Scientist


On 5/2/2022 12:35 PM, James Holton wrote:


Many of you may recall approximately 1000 years ago we were looking 
forward to getting together for another great Diffraction Methods GRC. 
Now, after a 4-year break, the meeting is on!
https://www.grc.org/diffraction-methods-in-structural-biology-conference/2022/ 



It will be in-person at Bates College in Lewiston, ME, USA, on July 
24-29 of 2022. Strange how it is strange to be considering meeting in 
person, but recent GRCs have proven they can be conducted safely. 
We've learned a lot about viruses in recent years, both in our lives 
and in our labs. Artificial Intelligence has come a long way, and the 
role of biological structure, and indeed science in general, is 
impacting the everyday lives of human beings more than ever before.


It is time we got together to talk about all this. Yes, we've gotten a 
lot of work done remotely, but some things just have to wait until you 
are face-to-face. Preferably over a Maine lobster dinner. GRCs are not 
about listening to talks, they are about the discussion that comes 
after. Newcomers and Veterans sharing and debating ideas until far too 
late at night. It is my sincere hope that fighting this virus, and 
looking toward a brighter future, will inspire even more visionary and 
collaborative ideas for the role structure will play in that future. I 
can't imagine a better theme of discussion for this next meeting.


-James Holton
MAD Scientist and Chair of the 2020/2022 Diffraction Methods GRC




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Diffraction Methods in Structural Biology - Gordon Research Conference - in-person! - July 24-29, 2022

2022-05-02 Thread James Holton
Many of you may recall approximately 1000 years ago we were looking 
forward to getting together for another great Diffraction Methods GRC. 
Now, after a 4-year break, the meeting is on!

https://www.grc.org/diffraction-methods-in-structural-biology-conference/2022/

It will be in-person at Bates College in Lewiston, ME, USA, on July 
24-29 of 2022. Strange how it is strange to be considering meeting in 
person, but recent GRCs have proven they can be conducted safely. We've 
learned a lot about viruses in recent years, both in our lives and in 
our labs. Artificial Intelligence has come a long way, and the role of 
biological structure, and indeed science in general, is impacting the 
everyday lives of human beings more than ever before.


It is time we got together to talk about all this. Yes, we've gotten a 
lot of work done remotely, but some things just have to wait until you 
are face-to-face. Preferably over a Maine lobster dinner. GRCs are not 
about listening to talks, they are about the discussion that comes 
after. Newcomers and Veterans sharing and debating ideas until far too 
late at night. It is my sincere hope that fighting this virus, and 
looking toward a brighter future, will inspire even more visionary and 
collaborative ideas for the role structure will play in that future. I 
can't imagine a better theme of discussion for this next meeting.


-James Holton
MAD Scientist and Chair of the 2020/2022 Diffraction Methods GRC



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] sftools

2022-04-05 Thread James Holton
I have found I usually need to run an mtz through "CAD" in order to 
sanitize it after doing things in sftools and other programs that . CAD 
is kind of my "make it a cannonical mtz again" program.


HTH

-James Holton
MAD Scientist

On 4/5/2022 6:47 AM, Eleanor Dodson wrote:



Does ANYONE know how to use this useful but ultra-frustrating program??

I have an mtz file which lacks WAVElength AND Dataset name.

I try to follow the sftools documentation, and get an output file which -
 lacks WAVElength AND Dataset name.

G


sftools <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] SHELXD - Limit Number of CPUs?

2022-03-16 Thread James Holton
Most cluster queueing systems have a way to limit such things. Which one 
are you using?


You can also set the environment variable OMP_NUM_THREADS to tell SHELXD 
to limit its CPU usage.  You do this with either:

setenv OMP_NUM_THREADS 10
or
export OMP_NUM_THREADS=10

on the command line before running shelxd.

-James Holton
MAD Scientist

On 3/16/2022 5:21 PM, Jessica Bruhn wrote:

Hi all,

I am wondering if there is a way to limit the number of CPUs that can 
be used by SHELXD. It seems that this program uses all that are 
available until it hits the NTRY you specified or it finds a .fin 
file. Is there a way to limit its CPU and MEM usage? I am running this 
on a large cluster along with other jobs and don't want to get myself 
into trouble.


Thanks so much!

Best,
Jessica

--
Jessica Bruhn, Ph.D
Scientific Group Leader, MicroED
NanoImaging Services, Inc.
4940 Carroll Canyon Road, Suite 115
San Diego, CA 92121
Phone #: (888) 675-8261
www.nanoimagingservices.com <http://www.nanoimagingservices.com/>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] freezing in ethane / propane - unipucks ?

2022-01-26 Thread James Holton
I think the issue with propane at at least some light sources is that it 
is flammable.  Makes shipping and safety more complicated.


How hard would it be to let the propane melt off in a cryo gas stream at 
your home lab?  Then use tongs to transfer the pin into liquid N2 for 
handling as usual?  If you don't have a working N2 gas stream and are on 
a budget they are not that hard to build:

https://doi.org/10.1107/S0021889894006357

Cheers,

-James Holton
MAD Scientist


On 1/26/2022 9:25 AM, Guenter Fritz wrote:

Dear Dom,

thanks a lot. Yes, this might work sending a combipuck alongside with 
a good bottle for the local contact.

I was wondering whether the grippers can handle a block of propane?

Best wishes,
guenter

Dear Guenter,

Would the use of vials inside combi-pucks 
(https://www.mitegen.com/product/combipuck-system/) and some 
arrangements with your local contact at the other end, perhaps help 
with using propane remotely?


BW,

D

On 26/01/2022 16:53, Guenter Fritz wrote:

Dear all,

we have some  delicate crystals which might benefit from freezing in 
propane. In former times (when I was still travelling to the 
beamlines) we waited until the propane was solid in the vial and 
then let the propane thaw in the cryo stream at the beamline.


But how can we  do this in these days with unipucks and no manual 
mounting?


I was thinking about freezing in ethane and then transfer the loops 
to liquid nitrogen, similarly we handle grids for cryo EM. Does 
somebody has tried that? Any experience, tips & tricks would be very 
welcome! Thanks in advance and best regards,


Guenter

 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Validation of structure prediction

2022-01-15 Thread James Holton


On 1/13/2022 11:14 AM, Tristan Croll wrote:

(please don’t actually do this)


Too late!  I've been doing that for years.  What happens, of course, is 
the "geometry" improves, but the R factors go through the roof. This I 
expect comes as no surprise to anyone who has played with the "weight" 
parameters in refinement, but maybe it should?  What is it about our 
knowledge of chemical bond lengths, angles, and radii that is 
inconsistent with the electron density of macromolecules, but not small 
molecules?  Why do macro-models have a burning desire to leap away from 
the configuration we know they adopt in reality?  If you zoom in on 
those "bad clashes" individually, they don't look like something that is 
supposed to happen. There is a LOT of energy stored up in those little 
springs.  I have a hard time thinking that's for real. The molecule is 
no doubt doing something else and we're just not capturing it properly.  
There is information to be had here, a lot of information.


This is why I too am looking for an all-encompassing "geometry score". 
Right now I'm multiplying other scores together:


score = (1+Clashscore)*sin(worst_omega)*1./(1+worst_rama)*1/(1+worst_rota)
*Cbetadev*worst_nonbond*worst_bond*worst_angle*worst_dihedral*worst_chir*worst_plane

where things like worst_rama is the "%score" given to the worst 
Ramachandran angle by phenix.ramalyze, and worst_bond is the largest 
"residual" reported among all the bonds in the structure by molprobity 
or phenix.geometry_minimization.  For "worst_nonbond" I'm plugging the 
observed and ideal distances into a Leonard-Jones6-12 potential to 
convert it into an "energy" that is always positive.


With x-ray data in hand, I've been multiplying this whole thing by Rwork 
and trying to find clever ways to minimize the product.  Rfree is then, 
as always, the cross-check.


Or does someone have a better idea?

-James Holton
MAD Scientist


On 1/13/2022 11:14 AM, Tristan Croll wrote:
Hard but not impossible - even when you *are* fitting to low-res 
density. See 
https://twitter.com/crolltristan/status/1381258326223290373?s=21 for 
example - no Ramachandran outliers, 1.3% sidechain outliers, 
clashscore of 2... yet multiple regions out of register by anywhere up 
to 15 residues! I never publicly named the structure (although I did 
share my rebuilt model with the authors), but the videos and images in 
that thread should be enough to illustrate the scale of the problem.


And that was *with* a map to fit! Take away the map, and run some MD 
energy minimisation (perhaps with added Ramachandran and rotamer 
restraints), and I think it would be easy to get your model to fool 
most “simple” validation metrics (please don’t actually do this). The 
upshot is that I still think validation of predicted models in the 
absence of at least moderate-resolution experimental data is still a 
major challenge requiring very careful thought.


— Tristan

On 13 Jan 2022, at 18:41, James Holton  wrote:


Agree with Pavel.

Something I think worth adding is a reminder that the MolProbity 
score only looks at bad clashes, ramachandran and rotamer outliers.


MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5

 It pays no attention whatsoever to twisted peptide bonds, C-beta 
deviations, and, for that matter, bond lengths and bond angles. If 
you tweak your weights right you can get excellent MP scores, but 
horrible "geometry" in the traditional bonds-and-angles sense. The 
logic behind this kind of validation is that normally nonbonds and 
torsions are much softer than bond and angle restraints and therefore 
fertile ground for detecting problems.  Thus far, I am not aware of 
any "Grand Unified Score" that combines all geometric considerations, 
but perhaps it is time for one?


Tristan's trivial solution aside, it is actually very hard to make 
all the "geometry" ideal for a real-world fold, and especially 
difficult to do without also screwing up the agreement with density 
(R factor).  I would argue that if you don't have an R factor then 
you should get one, but I am interested in opinions about alternatives.


I.E. What if we could train an AI to predict Rfree by looking at the 
coordinates?


-James Holton
MAD Scientist

On 12/21/2021 9:25 AM, Pavel Afonine wrote:

Hi Reza,

If you think about it this way... Validation is making sure that the 
model makes sense, data make sense and model-to-data fit make sense, 
then the answer to your question is obvious: in your case you do not 
have experimental data (at least in a way we used to think of it) 
and so then of these three validation items you only have one, 
which, for example, means you don’t have to report things like 
R-factors or completeness in high-resolution shell.


Really, the geometry of an alpha helix does not depend on how you 
determined it: using X-

Re: [ccp4bb] Validation of structure prediction

2022-01-13 Thread James Holton

Agree with Pavel.

Something I think worth adding is a reminder that the MolProbity score 
only looks at bad clashes, ramachandran and rotamer outliers.


MPscore=0.426∗ln(1+clashscore)+0.33∗ln(1+max(0,rota_out−1))+0.25∗ln(1+max(0,rama_iffy−2))+0.5

 It pays no attention whatsoever to twisted peptide bonds, C-beta 
deviations, and, for that matter, bond lengths and bond angles. If you 
tweak your weights right you can get excellent MP scores, but horrible 
"geometry" in the traditional bonds-and-angles sense. The logic behind 
this kind of validation is that normally nonbonds and torsions are much 
softer than bond and angle restraints and therefore fertile ground for 
detecting problems.  Thus far, I am not aware of any "Grand Unified 
Score" that combines all geometric considerations, but perhaps it is 
time for one?


Tristan's trivial solution aside, it is actually very hard to make all 
the "geometry" ideal for a real-world fold, and especially difficult to 
do without also screwing up the agreement with density (R factor).  I 
would argue that if you don't have an R factor then you should get one, 
but I am interested in opinions about alternatives.


I.E. What if we could train an AI to predict Rfree by looking at the 
coordinates?


-James Holton
MAD Scientist

On 12/21/2021 9:25 AM, Pavel Afonine wrote:

Hi Reza,

If you think about it this way... Validation is making sure that the 
model makes sense, data make sense and model-to-data fit make sense, 
then the answer to your question is obvious: in your case you do not 
have experimental data (at least in a way we used to think of it) and 
so then of these three validation items you only have one, which, for 
example, means you don’t have to report things like R-factors or 
completeness in high-resolution shell.


Really, the geometry of an alpha helix does not depend on how you 
determined it: using X-rays or cryo-EM or something else! So, most (if 
not all) model validation tools still apply.


Pavel


On Mon, Dec 20, 2021 at 8:10 AM Reza Khayat  wrote:

Hi,


Can anyone suggest how to validate a predicted structure?
Something similar to wwPDB validation without the need for
refinement statistics. I realize this is a strange question given
that the geometry of the model is anticipated to be fine if the
structure was predicted by a server that minimizes the geometry to
improve its statistics. Nonetheless, the journal has asked me for
such a report. Thanks.


Best wishes,

Reza


Reza Khayat, PhD
Associate Professor
City College of New York
Department of Chemistry and Biochemistry
New York, NY 10031



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Elspeth Garman speaking at ALS Colloquium in 3 hours

2022-01-12 Thread James Holton

Greetings all!

At 10am PST Jan 12, 2022 (California time), 2.75 hours from now!

Special time slot to make attendance easier for Europe, Dr. Garman will 
be speaking on the topic:


"Estimating Doses for Synchrotron Experiments: Why and How"

Thought that might be of interest if you have the time!

Free and open to all, and be sure to mute your mic.
Here is the link:

https://lbnl.zoom.us/j/91028154433

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Erica Saphire speaking at ALS Colloquium - recording available

2021-12-09 Thread James Holton
Thank you everyone for your interest in Erica's talk, which went great!  
Not up on the website yet, but here is a quick gdrive link to the 
video.  enjoy!


https://drive.google.com/file/d/1uyCJpOkJxE77l5jsMzgfEp_BNBPdhxTI/view

-James Holton
MAD Scientist

On 12/8/2021 9:06 AM, James Holton wrote:

Greetings all!

At 3pm PST Dec 8, 2021 (California time),

I managed to get Erica Saphire to speak about her recent work at the 
ALS Colloquium.  It is a forum for users and staff of the Advanced 
Light Source to communicate with each other about how we use the 
light. I asked Erica because she is a long-time, highly productive 
user who is doing some very timely work. Her title:


A Global Consortium, Next-Generation SARS-CoV-2 Antibody Therapeutics 
and Stabilized Spike


I think it absolutely brilliant how she's gotten so many different 
entities to work together.


Thought that might be of interest if you have the time!

Free and open to all, and I'm not sure if this is being recorded or 
not.  Here are the links:


https://lbnl.zoom.us/j/97680529569
https://als.lbl.gov/news-events/seminars/

-James Holton
MAD Scientist





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Erica Saphire speaking at ALS Colloquium in 10 minutes!

2021-12-08 Thread James Holton

Starting in 10 min!
https://lbnl.zoom.us/j/97680529569

On 12/8/2021 9:06 AM, James Holton wrote:

Greetings all!

At 3pm PST Dec 8, 2021 (California time),

I managed to get Erica Saphire to speak about her recent work at the 
ALS Colloquium.  It is a forum for users and staff of the Advanced 
Light Source to communicate with each other about how we use the 
light. I asked Erica because she is a long-time, highly productive 
user who is doing some very timely work. Her title:


A Global Consortium, Next-Generation SARS-CoV-2 Antibody Therapeutics 
and Stabilized Spike


I think it absolutely brilliant how she's gotten so many different 
entities to work together.


Thought that might be of interest if you have the time!

Free and open to all, and I'm not sure if this is being recorded or 
not.  Here are the links:


https://lbnl.zoom.us/j/97680529569
https://als.lbl.gov/news-events/seminars/

-James Holton
MAD Scientist





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Erica Saphire speaking at ALS Colloquium in 6 hours

2021-12-08 Thread James Holton
Good news!  Erica's talk will be recorded!  I will update on this thread 
when the link is up.


Goodnight Europe,

-James Holton
MAD Scientist


On 12/8/2021 11:16 AM, Gerlind Sulzenbacher wrote:

Dear James,

thank you so much for putting into place all these outstanding seminars.

Here a plead to access not only Erica Saphire's talk, but all the 
previous talks.


I had a look at https://als.lbl.gov/news-events/seminars/, great list 
of seminars, but no way to access any recording.


I know, this plead puts just extra work on you.

In case you recorded or you will record, many thanks for sharing, in 
the limits of legal boundaries.


Otherwise, just forget and thanks again for always being there to push 
forward great science.


With best wishes,
Gerlind





On 08/12/2021 18:38, Gerard Bricogne wrote:

Dear James,

  Thank you for this notification about what should be a 
captivating and

inspiring talk.

  The time of 3pm PST is a little bit on the rough side for most 
European
would-be listeners: is there a chance of putting in a plea that this 
talk be
recorded and made available on demand, for at least a short period of 
time,

through a link that you would broadcast soon after the event?

  Thank you in advance!


  With best wishes,

   Gerard.

--
On Wed, Dec 08, 2021 at 09:06:53AM -0800, James Holton wrote:

Greetings all!

At 3pm PST Dec 8, 2021 (California time),

I managed to get Erica Saphire to speak about her recent work at the 
ALS
Colloquium.  It is a forum for users and staff of the Advanced Light 
Source
to communicate with each other about how we use the light. I asked 
Erica
because she is a long-time, highly productive user who is doing some 
very

timely work. Her title:

A Global Consortium, Next-Generation SARS-CoV-2 Antibody 
Therapeutics and

Stabilized Spike

I think it absolutely brilliant how she's gotten so many different 
entities

to work together.

Thought that might be of interest if you have the time!

Free and open to all, and I'm not sure if this is being recorded or 
not.

Here are the links:

https://lbnl.zoom.us/j/97680529569
https://als.lbl.gov/news-events/seminars/

-James Holton
MAD Scientist

 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Erica Saphire speaking at ALS Colloquium in 6 hours

2021-12-08 Thread James Holton

Thank you all,

I appreciate the convenience if it can be recorded.  I know Jennifer 
Doudna's talk was recorded, but there seems to have been some delay 
getting it hosted.


I will provide updates ASAP.  Hopefully before Europe falls asleep tonight.

-James Holton
MAD scientist

On 12/8/2021 11:16 AM, Gerlind Sulzenbacher wrote:

Dear James,

thank you so much for putting into place all these outstanding seminars.

Here a plead to access not only Erica Saphire's talk, but all the 
previous talks.


I had a look at https://als.lbl.gov/news-events/seminars/, great list 
of seminars, but no way to access any recording.


I know, this plead puts just extra work on you.

In case you recorded or you will record, many thanks for sharing, in 
the limits of legal boundaries.


Otherwise, just forget and thanks again for always being there to push 
forward great science.


With best wishes,
Gerlind





On 08/12/2021 18:38, Gerard Bricogne wrote:

Dear James,

  Thank you for this notification about what should be a 
captivating and

inspiring talk.

  The time of 3pm PST is a little bit on the rough side for most 
European
would-be listeners: is there a chance of putting in a plea that this 
talk be
recorded and made available on demand, for at least a short period of 
time,

through a link that you would broadcast soon after the event?

  Thank you in advance!


  With best wishes,

   Gerard.

--
On Wed, Dec 08, 2021 at 09:06:53AM -0800, James Holton wrote:

Greetings all!

At 3pm PST Dec 8, 2021 (California time),

I managed to get Erica Saphire to speak about her recent work at the 
ALS
Colloquium.  It is a forum for users and staff of the Advanced Light 
Source
to communicate with each other about how we use the light. I asked 
Erica
because she is a long-time, highly productive user who is doing some 
very

timely work. Her title:

A Global Consortium, Next-Generation SARS-CoV-2 Antibody 
Therapeutics and

Stabilized Spike

I think it absolutely brilliant how she's gotten so many different 
entities

to work together.

Thought that might be of interest if you have the time!

Free and open to all, and I'm not sure if this is being recorded or 
not.

Here are the links:

https://lbnl.zoom.us/j/97680529569
https://als.lbl.gov/news-events/seminars/

-James Holton
MAD Scientist

 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Erica Saphire speaking at ALS Colloquium in 6 hours

2021-12-08 Thread James Holton

Greetings all!

At 3pm PST Dec 8, 2021 (California time),

I managed to get Erica Saphire to speak about her recent work at the ALS 
Colloquium.  It is a forum for users and staff of the Advanced Light 
Source to communicate with each other about how we use the light. I 
asked Erica because she is a long-time, highly productive user who is 
doing some very timely work. Her title:


A Global Consortium, Next-Generation SARS-CoV-2 Antibody Therapeutics 
and Stabilized Spike


I think it absolutely brilliant how she's gotten so many different 
entities to work together.


Thought that might be of interest if you have the time!

Free and open to all, and I'm not sure if this is being recorded or 
not.  Here are the links:


https://lbnl.zoom.us/j/97680529569
https://als.lbl.gov/news-events/seminars/

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

2021-11-04 Thread James Holton

Ahh, waters.  Where would structure-related debate be without them?

I see. So if your default refinement procedure is to add an unspecified 
number of waters, then yes Rwork might not be all that useful, as it 
will depend on how the building goes.


Again, it all depends on what you want your data to do.  If you are 
looking for subtle difference features, such as a bound ligand, 
mutation, etc. then clear identification of weak density should be your 
"score".  So, I say:

1) pick some weak density
2) omit the model under it
3) refine to convergence
4) measure the Fo-Fc difference peak
  I tend to use the MAPMAN "peek" function for this, but I'm sure there 
are other ways.  I say use the SAME starting model each time, one where 
you have built in the obvious waters, ions, etc, but borderline or 
otherwise inconsistent ones, leave them out.  Then pick a low-lying 
feature as your test density.


Do not use molecular replacement. Use pointless with your starting model 
on "xyzin" to re-index the data so that it matches the model. Super fast 
and easy to do. No origin issues, and it doesn't modify the pdb.


Aside:  I actually have a program for removing unneeded waters I call 
"watershed".  It is not fast, but it is thorough, and you only need to 
do it for your reference structure. You will need these programs:

https://github.com/fraser-lab/holton_scripts/tree/master/watershed
https://github.com/fraser-lab/holton_scripts/blob/master/converge_refmac.com
 a pdb, an mtz, and a file called refmac_opts.txt that contains all the 
refmac5 keywords you want to use (if any).  You will also want a lot of 
CPUs, and the script works with the PBS, and SGE clusters I have access 
to (and I'm working on Slurm). What watershed does is delete waters one 
at a time and re-refine to convergence.  Also, as a control, you want to 
refine the starting structure for the same number of cycles. Each "minus 
one water" structure gets its own CPU. Once everything settles, you look 
at the final Rwork values. If deleting a water ends up making Rwork 
better? ... then you probably shouldn't have built it in the first 
place. That water is evil and must go. After throwing out the worst 
water, you now have a new starting point. In some published structures 
more than 100 waters can be eliminated this way. Almost always brings 
Rwork and Rfree closer together, even though Rfree does not enter into 
any automated decisions.


 Using simulated data (where I know the ground truth) I find the 
watershed procedure tends to un-do all the horrible things that happen 
after you get over-aggressive and stuff waters into every little peak 
you see. Eventually, as you add more noise waters, Rfree starts to go 
up, and the map starts to look less like the ground truth, but Rwork 
keeps going down the more waters you add.  What watershed does pretty 
reliably is bring you back to the point just before where Rfree started 
to take a turn for the worse, and you can do this without ever looking 
at Rfree!


Of course, it is always better to not put in bad waters in the first 
place, but sometimes its hard to tell.


Anyway, I suggest using a watershed-ed model as your reference.

Hope that is helpful in some way?

-James Holton
MAD Scientist


On 11/2/2021 5:01 PM, Murpholino Peligro wrote:

That's exactly what I am doing...
citing David...

"I expect the A and B data sets to be quite similar, but I would like 
to evaluate which protocol was "better", and I want to do this 
quickly, ideally looking at a single number."


and

"I do want to find a way to assess the various tweaks I can try in 
data processing for a single case"


Why not do all those things with Rwork?
I thought that comparing the R-free rather than the R-work was going 
to be easier Because last week the structure was dehydrated So 
the refinement program added "strong waters" and due to a thousand or 
so extra reflections I could have a dozen or so extra waters and the 
difference in R-work value between protocols due to extra waters was 
going to be a little bit more difficult to compare. I have now the 
final structure so I could very well compare the R-work doing another 
round of refinement, maybe randomizing adps at the beginning or 
something.


Thanks a lot.









El lun, 1 de nov. de 2021 a la(s) 03:22, David Waterman 
(dgwater...@gmail.com) escribió:


Hi James,

What you wrote makes lots of sense. I had not heard about Rsleep,
so that looks like interesting reading, thanks.

I have often used Rfree as a simple tool to compare two protocols.
If I am not actually optimising against Rfree but just using it
for a one-off comparison then that is okay, right?

Let's say I have two data processing protocols, A and B. Between
these I might be exploring some difference in options within one
data processing program, perhaps differ

Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

2021-11-01 Thread James Holton

Hi David,

Why not do all those things with Rwork? It is much less noisy than 
Rfree. Have you ever seen a case in such analysis where Rwork didn't 
tell you the same thing Rfree did?  If so, did you believe the difference?


Once when I was playing with lossy image compression if I picked just 
the right compression ratio I could get slightly better Rfree. But that 
is not something I'd recommend as a good idea.


-James Holton
MAD Scientist

On 11/1/2021 2:22 AM, David Waterman wrote:

Hi James,

What you wrote makes lots of sense. I had not heard about Rsleep, so 
that looks like interesting reading, thanks.


I have often used Rfree as a simple tool to compare two protocols. If 
I am not actually optimising against Rfree but just using it for a 
one-off comparison then that is okay, right?


Let's say I have two data processing protocols, A and B. Between these 
I might be exploring some difference in options within one data 
processing program, perhaps different geometry refinement parameters, 
or scaling options. I expect the A and B data sets to be quite 
similar, but I would like to evaluate which protocol was "better", and 
I want to do this quickly, ideally looking at a single number. I don't 
like I/sigI because I don't trust the sigmas, CC1/2 is often noisy, 
and I'm totally sworn off merging R statistics for these purposes. I 
tend to use Rfree as an easily-available metric, independent from the 
data processing program and the merging stats. It also allows a 
comparison of A and B in terms of the "product" of crystallography, 
namely the refined structure. In this I am lucky because I'm not 
trying to solve a structure. I may be looking at lysozyme or 
proteinase K: something where I can download a pretty good 
approximation to the truth from the PDB.


So, what I do is process the data by A and process by B, ensure the 
data sets have the same free set, then refine to convergence (or at 
least, a lot of cycles) starting from a PDB structure. I then evaluate 
A vs B in terms of Rfree, though without an error bar on Rfree I don't 
read too much into small differences.


Does this procedure seem sound? Perhaps it could be improved by 
randomly jiggling the atoms in the starting structure, in case the PDB 
deposition had already followed an A- or B-like protocol. Perhaps the 
whole approach is suspect. Certainly I wouldn't want to generalise by 
saying that A or B is better in all cases, but I do want to find a way 
to assess the various tweaks I can try in data processing for a single 
case.


Any thoughts? I appreciate the wisdom of the BB here.

Cheers

-- David


On Fri, 29 Oct 2021 at 15:50, James Holton  wrote:


Well, of all the possible metrics you could use to asses data
quality Rfree is probably the worst one.  This is because it is a
cross-validation metric, and cross-validations don't work if you
use them as an optimization target. You can try, and might even
make a little headway, but then your free set is burnt. If you
have a third set of observations, as suggested for Rsleep
(doi:10.1107/S0907444907033458), then you have a chance at another
round of cross-validation. Crystallographers don't usually do
this, but it has become standard practice in machine learning
(training=Rwork, validation=Rfree and testing=Rsleep).

So, unless you have an Rsleep set, any time you contemplate doing
a bunch of random things and picking the best Rfree ... don't. 
Just don't.  There madness lies.

What happens after doing this is you will be initially happy about
your lower Rfree, but everything you do after that will make it go
up more than it would have had you not performed your Rfree
optimization. This is because the changes in the data that made
Rfree randomly better was actually noise, and as the structure
becomes more correct it will move away from that noise. It's
always better to optimize on something else, and then check your
Rfree as infrequently as possible. Remember it is the control for
your experiment. Never mix your positive control with your sample.

As for the best metric to assess data quality?  Well, what are you
doing with the data? There are always compromises in data
processing and reduction that favor one application over another. 
If this is a "I just want the structure" project, then score on
the resolution where CC1/2 hits your favorite value. For some that
is 0.5, others 0.3. I tend to use 0.0 so I can cut it later
without re-processing. Whatever you do just make it consistent.

If its for anomalous, score on CCanom or if that's too noisy the
Imean/sigma in the lowest-angle resolution or highest-intensity
bin. This is because for anomalous you want to minimize relative
error. The end-all-be-all of anomalous signal strength is the
phased anomalous difference Fourier. You need phases to do one,
but if 

Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

2021-10-29 Thread James Holton


Well, of all the possible metrics you could use to asses data quality 
Rfree is probably the worst one.  This is because it is a 
cross-validation metric, and cross-validations don't work if you use 
them as an optimization target. You can try, and might even make a 
little headway, but then your free set is burnt. If you have a third set 
of observations, as suggested for Rsleep 
(doi:10.1107/S0907444907033458), then you have a chance at another round 
of cross-validation. Crystallographers don't usually do this, but it has 
become standard practice in machine learning (training=Rwork, 
validation=Rfree and testing=Rsleep).


So, unless you have an Rsleep set, any time you contemplate doing a 
bunch of random things and picking the best Rfree ... don't.  Just 
don't.  There madness lies.


What happens after doing this is you will be initially happy about your 
lower Rfree, but everything you do after that will make it go up more 
than it would have had you not performed your Rfree optimization. This 
is because the changes in the data that made Rfree randomly better was 
actually noise, and as the structure becomes more correct it will move 
away from that noise. It's always better to optimize on something else, 
and then check your Rfree as infrequently as possible. Remember it is 
the control for your experiment. Never mix your positive control with 
your sample.


As for the best metric to assess data quality?  Well, what are you doing 
with the data? There are always compromises in data processing and 
reduction that favor one application over another.  If this is a "I just 
want the structure" project, then score on the resolution where CC1/2 
hits your favorite value. For some that is 0.5, others 0.3. I tend to 
use 0.0 so I can cut it later without re-processing. Whatever you do 
just make it consistent.


If its for anomalous, score on CCanom or if that's too noisy the 
Imean/sigma in the lowest-angle resolution or highest-intensity bin. 
This is because for anomalous you want to minimize relative error. The 
end-all-be-all of anomalous signal strength is the phased anomalous 
difference Fourier. You need phases to do one, but if you have a 
structure just omit an anomalous scatterer of interest, refine to 
convergence, and then measure the peak height at the position of the 
omitted anomalous atom.  Instructions for doing anomalous refinement in 
refmac5 are here:

https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/refmac_keywords.html

If you're looking for a ligand you probably want isomorphism, and in 
that case refining with a reference structure looking for low Rwork is 
not a bad strategy. This will tend to select for crystals containing a 
molecule that looks like the one you are refining.  But be careful! If 
it is an apo structure your ligand-bound crystals will have higher Rwork 
due to the very difference density you are looking for.


But if its the same data just being processed in different ways, first 
make a choice about what you are interested in, and then optimize on 
that.  just don't optimize on Rfree!


-James Holton
MAD Scientist


On 10/27/2021 8:44 AM, Murpholino Peligro wrote:
Let's say I ran autoproc with different combinations of options for a 
specific dataset, producing dozens of different (but not so different) 
mtz files...
Then I ran phenix.refine with the same options for the same structure 
but with all my mtz zoo

What would be the best metric to say "hey this combo works the best!"?
R-free?
Thanks

M. Peligro



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] ALS Colloquium Seminar Tomorrow - Jennifer Doudna Lecture

2021-10-27 Thread James Holton

Hello all,

Jennifer Doudna Lecture starting in 30 minutes!  3 pm CA time (PDT).

Zoom link: https://lbnl.zoom.us/j/96095492358

See you there,

-James Holton
MAD Scientist



On 26 Oct 2021, at 22:19, James Holton  wrote:

Greetings all,

This year I have the honor of co-organizing the ALS Colloquium Seminar 
series. This is normally a local but broad-audience forum for users of 
the Advanced Light Source to learn more about what other users are 
doing. But, now that its gone virtual you can all join in and listen 
to what I think will be some very interesting talks. Maybe even ask a 
question? Its not easy finding speakers that appeal to a broad range 
of scientist ranging from biologists to gravel monkeys and chemists, 
but what we have in common is we all need light.


Tomorrow's speaker is one of my beamline users!
Jennifer Doudna, will be on at 3pm Oct 27, 2021 California time (25 
hours from now). Her title is: "CRISPR: The Science and Opportunity of 
Genome Editing"


Zoom links on this page:
https://als.lbl.gov/news-events/seminars/

It is free for anyone to view, but we are expecting a large audience 
so please remember to mute your mic.






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] ALS Colloquium Seminar Tomorrow - Jennifer Doudna Lecture

2021-10-26 Thread James Holton

Greetings all,

This year I have the honor of co-organizing the ALS Colloquium Seminar 
series. This is normally a local but broad-audience forum for users of 
the Advanced Light Source to learn more about what other users are 
doing. But, now that its gone virtual you can all join in and listen to 
what I think will be some very interesting talks. Maybe even ask a 
question? Its not easy finding speakers that appeal to a broad range of 
scientist ranging from biologists to gravel monkeys and chemists, but 
what we have in common is we all need light.


Tomorrow's speaker is one of my beamline users!
Jennifer Dounda, will be on at 3pm Oct 27, 2021 California time (25 
hours from now). Her title is: "CRISPR: The Science and Opportunity of 
Genome Editing"


Zoom links on this page:
https://als.lbl.gov/news-events/seminars/

It is free for anyone to view, but we are expecting a large audience so 
please remember to mute your mic.



Coming soon are many other exciting speakers I am proud to have brought 
to this series, including one of my other users:
Erica Saphire will be on at 3pm Dec 8, 2021, talking about "A Global 
Consortium, Next-Generation SARS-CoV-2 Antibody Therapeutics and 
Stabilized Spike"


See you in the zoom!

-James Holton
MAD Scientist




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-19 Thread James Holton

Thank you Gergely,

Oh, don't worry, I am not concerned about belief. Neither the model nor 
the data care what I believe.


What I am really asking is: what is the proper way to combine weak 
observations?


Right now, in pretty much all structural sciences we are not used to 
doing this, but we are entering an era where we will have to.


I was trying to ask a simple question with the 10x10 pixel patch because 
(as Graeme, Ian and others pointed out) it highlights how the solution 
must also apply to two patches of 50 pixels.  In reality, unfortunately, 
those two patches might not be next to each other and will have 
different Lorentz factors, polarizaiton factors, absorption factors, and 
probably different partiality as well. These values are knowable, but 
they are not integers. The way we currently deal with all this is to 
first convert patches of pixels into an expectation and variance, then 
apply all the corrections, and finally "merge" everything with error 
propagation into simple list of h,k,l,Iobs,sigIobs that we can compare 
to a PDB file.


You are absolutely right that the best thing to do would be fitting a 
model of the whole diffractometer and crystal, structure factors 
included, directly and pixel-by-pixel to the image data.  Some 
colleagues and I managed to do this recently 
(https://doi.org/10.1107/s2052252520013007). It is rather 
computationally expensive, but seems to be working.


I hope this will be a useful tool, but I don't think such an approach 
will ever completely supplant data reduction, as there are many 
advantages to the latter.  But only if you do the statistics right!  
This is why I asked the community so that folks cleverer and more 
experienced than I in such matters (such as yourself) can correct me if 
I'm getting something wrong.  And the community benefits from the 
discussion.


Thank you for your thoughtful and thought-provoking insights!

-James Holton
MAD Scientist


On 10/19/2021 2:05 AM, Gergely Katona wrote:

Dear James,

I am sorry to nitpick, but this is the answer to "what is my belief of expectation 
and variance if I observe a 10x10 patch of pixels with zero counts?" This will 
heavily depend on my model.
When I make predictions like this, my intention is not to replace the data with a 
"new and improved" data that is closer to the Truth and deposit in some 
database from the position of authority.

I would simply use it to validate my model. Well, my model expects the Iobs to 
be 0.01, but in fact it is 0. This may make me slightly worried, but then I 
look at the posterior distribution and I see 0 with highest posterior 
probability so I relax a bit that I do not have to throw out my model outright. 
Still, a better model may be out there.
For a Bayesian the data is fixed and holy, the model may change. And the question rarely manifests 
like that one does not have to spend a lot of time pondering about if a uniform distribution of the 
rate is compatible with my belief in some quantum process. Bayesian folks are pragmatic. Your 
question about "what is my belief about the slope and intercept of a line that is the basis of 
some time-dependent random process given my observations" is more relevant. It is 
straightforward to implement as a Bayesian network to answer this question and it will give you 
predictions that looks deceptively like the data. Here, you only care about your prior belief about 
the magnitude of slope and intercept, the belief about what the rate may be independent of time is 
quite irrelevant and so are the predictions they may make. And I guess you would not intend to 
deposit images that were generated by the predictions of these posterior models and the "new 
and improved data".

Best wishes,

Gergely


Gergely Katona, Professor, Chairman of the Chemistry Program Council
Department of Chemistry and Molecular Biology, University of Gothenburg
Box 462, 40530 Göteborg, Sweden
Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910
Web: http://katonalab.eu, Email: gergely.kat...@gu.se

-Original Message-----
From: CCP4 bulletin board  On Behalf Of James Holton
Sent: 18 October, 2021 21:41
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] am I doing this right?

Thank you very much for this Kay!

So, to summarize, you are saying the answer to my question "what is the expectation 
and variance if I observe a 10x10 patch of pixels with zero counts?" is:
Iobs = 0.01
sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))

And for the one-pixel case:
Iobs = 1
sigIobs = 1

but in both cases the distribution is NOT Gaussian, but rather exponential. And 
that means adding variances may not be the way to propagate error.

Is that right?

-James Holton
MAD Scientist



On 10/18/2021 7:00 AM, Kay Diederichs wrote:

Hi James,

I'm a bit behind ...

My answer about the basic question ("a patch of 100 pixels each with zero counts - 
what is the vari

Re: [ccp4bb] am I doing this right?

2021-10-18 Thread James Holton
HDF5 is still "framing", but using better compression than the "byte 
offset" one implemented in Pilatus CBFs, which has a minimum of one byte 
per pixel. Very fast, but not designed for near-blank images.


Assuming entropy-limited compression the ultimate data rate is the 
number of photons/s hitting the detector multiplied by log2(Npix) where 
Npix is the number of pixels. The reason its log2() is because that's 
the number of bits needed to store the address of which pixel got the 
photon, and since the arrival of each photon is basically random further 
compression is generally not possible without loss of information.  
There might be some additional bits about the time interval, but it 
might be more efficient to store that implicitly in the framing. As long 
as storing "no photons" only takes up one bit that would probably be 
more efficient.


So, for a 100 micron thick sample, flux = 1e12 photons/s and ~4000 
pixels you get ~3.4 GB/s of perfectly and losslessly compressed data.  
Making it smaller than that requires throwing away information.


I'm starting to think this might be the best prior. If you start out 
assuming nothing (not even uniform), then the variance of 0 photons may 
well be infinite. However, it is perhaps safe to assume that the dataset 
as a whole as at least one photon in it. And then if you happen to know 
the whole data set contains N photons and you have F images of Q pixels, 
then maybe a reasonable prior distribution is Poissonian with 
mean=variance= N/F/Q photons/pixel ?


-James Holton
MAD Scientist

On 10/17/2021 11:30 PM, Frank von Delft wrote:
Thanks, I learnt two things now - one of which being that I'm credited 
with coining that word!  Stap me vittals...


If it's single photon events you're after, isn't it quantum statistics 
where you need to go find that prior?  (Or is that what you're doing 
in this thread - I wouldn't be able to tell.)


Also:  should the detectors change how they read out things, then?  
Just write out the events with timestamp, rather than dumping all 
pixels all the time into these arbitrary containers called "image".  
Or is that what's already happening in HDF5 (which I don't understand 
one bit, I should add).


Frank




On 17/10/2021 18:12, James Holton wrote:


Well Frank, I think it comes down to something I believe you were the 
first to call "dose slicing".


Like fine phi slicing, collecting a larger number of weaker images 
records the same photons, but with more information about the sample 
before it dies. In fine phi slicing the extra information allows you 
to do better background rejection, and in "dose slicing" the extra 
information is about radiation damage. We lose that information when 
we use longer exposures per image, and if you burn up the entire 
useful life of your crystal in one shot, then all information about 
how the spots decayed during the exposure is lost. Your data are also 
rather incomplete.


How much information is lost? Well, how much more disk space would be 
taken up, even after compression, if you collected only 1 photon per 
image?  And kept collecting all the way out to 30 MGy in dose? That's 
about 1 million photons (images) per cubic micron of crystal.  So, 
I'd say the amount of information lost is "quite a bit".


But what makes matters worse is that if you did collect this data set 
and preserved all information available from your crystal you'd have 
no way to process it. This is not because its impossible, its just 
that we don't have the software. Your only choice would be to go find 
images with the same "phi" value and add them together until you have 
enough photons/pixel to index it. Once you've got an indexing 
solution you can map every photon hit to a position in reciprocal 
space as well as give it a time/dose stamp. What do you do with 
that?  You can do zero-dose extrapolation, of course!  Damage-free 
data! Wouldn't that be nice. Or can you?  The data you will have in 
hand for each reciprocal-space pixel might look something like:
tic tic .. tic . tic ... tic tictic ... 
tictic.


So. Eight photons.  With time-of-arrival information.  How do you fit 
a straight line to that?  You could "bin" the data or do some kind of 
smoothing thing, but then you are losing information again. Perhaps 
also making ill-founded assumptions. You need error bars of some 
kind, and, better yet, the shape of the distribution implied by those 
error bars.


And all this makes me think somebody must have already done this. I'm 
willing to bet probably some time in the late 1700s to early 1800s. 
All we're really talking about here is augmenting maximum-likelihood 
estimation of an average value to maximum-likelihood estimation of a 
straight line. That is, slope and intercept, with sigmas on both. I 
suspect the proper approach is to first bring everything down

Re: [ccp4bb] am I doing this right?

2021-10-18 Thread James Holton

Thank you very much for this Kay!

So, to summarize, you are saying the answer to my question "what is the 
expectation and variance if I observe a 10x10 patch of pixels with zero 
counts?" is:

Iobs = 0.01
sigIobs = 0.01 (defining sigIobs = sqrt(variance(Iobs)))

And for the one-pixel case:
Iobs = 1
sigIobs = 1

but in both cases the distribution is NOT Gaussian, but rather 
exponential. And that means adding variances may not be the way to 
propagate error.


Is that right?

-James Holton
MAD Scientist



On 10/18/2021 7:00 AM, Kay Diederichs wrote:

Hi James,

I'm a bit behind ...

My answer about the basic question ("a patch of 100 pixels each with zero counts - 
what is the variance?") you ask is the following:

1) we all know the Poisson PDF (Probability Distribution Function)  P(k|l) = 
l^k*e^(-l)/k!  (where k stands for for an integer >=0 and l is lambda) which 
tells us the probability of observing k counts if we know l. The PDF is 
normalized: SUM_over_k (P(k|l)) is 1 when k=0...infinity is 1.
2) you don't know before the experiment what l is, and you assume it is some number x 
with 0<=x<=xmax (the xmax limit can be calculated by looking at the physics of 
the experiment; it is finite and less than the overload value of the pixel, otherwise 
you should do a different experiment). Since you don't know that number, all the x 
values are equally likely - you use a uniform prior.
3) what is the PDF P(l|k) of l if we observe k counts?  That can be found with Bayes 
theorem, and it turns out that (due to the uniform prior) the right hand side of the 
formula looks the same as in 1) : P(l|k) = l^k*e^(-l)/k! (again, the ! stands for the 
factorial, it is not a semantic exclamation mark). This is eqs. 7.42 and 7.43 in Agostini 
"Bayesian Reasoning in Data Analysis".
3a) side note: if we calculate the expectation value for l, by multiplying with 
l and integrating over l from 0 to infinity, we obtain E(P(l|k))=k+1, and 
similarly for the variance (Agostini eqs 7.45 and 7.46)
4) for k=0 (zero counts observed in a single pixel), this reduces to 
P(l|0)=e^(-l) for a single observation (pixel). (this is basic math; see also 
§7.4.1 of Agostini.
5) since we have 100 independent pixels, we must multiply the individual PDFs 
to get the overall PDF f, and also normalize to make the integral over that PDF 
to be 1: the result is f(l|all 100 pixels are 0)=n*e^(-n*l). (basic math). A 
more Bayesian procedure would be to realize that the posterior PDF 
P(l|0)=e^(-l) of the first pixel should be used as the prior for the second 
pixel, and so forth until the 100th pixel. This has the same result f(l|all 100 
pixels are 0)=n*e^(-n*l) (Agostini § 7.7.2)!
6) the expectation value INTEGRAL_0_to_infinity over l*n*e^(-n*l) dl is 1/n .  
This is 1 if n=1 as we know from 3a), and 1/100 for 100 pixels with 0 counts.
7) the variance is then INTEGRAL_0_to_infinity over (l-1/n)^2*n*e^(-n*l) dl . 
This is 1/n^2

I find these results quite satisfactory. Please note that they deviate from the 
MLE result: expectation value=0, variance=0 . The problem appears to be that a 
Maximum Likelihood Estimator may give wrong results for small n; something that 
I've read a couple of times but which appears not to be universally 
known/taught. Clearly, the result in 6) and 7) for large n converges towards 0, 
as it should be.
What this also means is that one should really work out the PDF instead of just 
adding expectation values and variances (and arriving at 100 if all 100 pixels 
have zero counts) because it is contradictory to use a uniform prior for all 
the pixels if OTOH these agree perfectly in being 0!

What this means for zero-dose extrapolation I have not thought about. At least 
it prevents infinite weights!

Best,
Kay








To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] am I doing this right?

2021-10-17 Thread James Holton

Thank you Gergely.  That is interesting!

I don't mind at all making this Bayesian, as long as it works!

Something I'm not quite sure about: does the prior distribution HAVE to 
be a gamma distribution? Not that that really narrows things down since 
there are an infinite number of them, but is that really the "i have no 
idea" prior? Or just a convenient closed-form choice? I've only just 
recently heard of conjugate priors.


Much appreciate any thoughts you may have on this,

-James


On 10/16/2021 3:48 PM, Gergely Katona wrote:

Dear James,

If I understand correctly you are looking for a single rate parameter to 
describe the pixels in a block. It would also be possible to estimate the rates 
for individual pixels or estimate the thickness of the sample from the counts 
if you have a good model, that is where Bayesian methods really shine. I tested 
the simplest first Bayesian network with 10 and 100 zero count pixels, 
respectively:

https://colab.research.google.com/drive/1TGJx2YT9I-qyOT1D9_HCC7G7as1KXg2e?usp=sharing


The two posterior distributions are markedly different even if they start from 
the same prior distribution, which I find more intuitive than the frequentist 
treatment of uncertainty. You can test different parameters for the gamma prior 
or change to another prior distribution. It is possible to reduce the posterior 
distributions to their mean or posterior maximum, if needed. If you are looking 
for an alternative to the Bayesian perspective then this will not help, 
unfortunately.

Best wishes,

Gergely

-Original Message-
From: CCP4 bulletin board  On Behalf Of James Holton
Sent: den 16 oktober 2021 21:01
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] am I doing this right?

Thank you everyone for your thoughtful and thought-provoking responses!

But, I am starting to think I was not as clear as I could have been about my 
question.  I am actually concerning myself with background, not necessarily 
Bragg peaks.  With Bragg photons you want the sum, but for background you want 
the average.

What I'm getting at is: how does one properly weight a zero-photon observation 
when it comes time to combine it with others?  Hopefully they are not all zero. 
 If they are, check your shutter.

So, ignoring Bragg photons for the moment (let us suppose it is a systematic 
absence) what I am asking is: what is the variance, or, better yet,what is the 
WEIGHT one should assign to the observation of zero photons in a patch of 10x10 
pixels?

In the absence of any prior knowledge this is a difficult question, but a 
question we kind of need to answer if we want to properly measure data from 
weak images.  So, what do we do?

Well, with the "I have no idea" uniform prior, it would seem that expectation 
(Epix) and variance (Vpix) would be k+1 = 1 for each pixel, and therefore the sum of Epix 
and Vpix over the 100 independent pixels is:

Epatch=Vpatch=100 photons

I know that seems weird to assume 100 photons should have hit when we actually 
saw none, but consider what that zero-photon count, all by itself, is really 
telling you:
a) Epix > 20 ? No way. That is "right out". Given we know its Poisson 
distributed, and that background is flat, it is VERY unlikely you have E that big when you 
saw zero. Cross all those E values off your list.
b) Epix=0 ? Well, that CAN be true, but other things are possible and all of them 
are E>0. So, most likely E is not 0, but at least a little bit higher.
c) Epix=1e-6 ?  Yeah, sure, why not?
d) Epix= -1e-6 ?  No. Don't be silly.
e) If I had to guess? Meh. 1 photon per pixel?  That would be k+1

I suppose my objection to E=V=0 is because V=0 implies infinite confidence in 
the value of E, and that we don't have. Yes, it is true that we are quite 
confident in the fact that we did not see any photons this time, but the 
remember that E and V are the mean and variance that you would see if you did a 
million experiments under the same conditions. We are trying to guess those 
from what we've got. Just because you've seen zero a hundred times doesn't mean 
the 101st experiment won't give you a count.  If it does, then maybe 
Epatch=0.01 and Epix=0.0001?  But what do you do before you see your first 
photon?
All you can really do is bracket it.

But what if you come up with a better prior than "I have no idea" ?
Well, we do have other pixels on the detector, and presuming the background is 
flat, or at least smooth, maybe the average counts/pixel is a better prior?

So, let us consider an ideal detector with 1e6 independent pixels. Let us 
further say that 1e5 background photons have hit that detector.  I want to 
still ignore Bragg photons because those have a very different prior 
distribution to the background.  Let us say we have masked off all the Bragg 
areas.

The average overall background is then 0.1 photons/pixel. Let us assign that to 
the prior probability Ppix = 0.1.  Now let us look again a

Re: [ccp4bb] am I doing this right?

2021-10-17 Thread James Holton


Well Frank, I think it comes down to something I believe you were the 
first to call "dose slicing".


Like fine phi slicing, collecting a larger number of weaker images 
records the same photons, but with more information about the sample 
before it dies. In fine phi slicing the extra information allows you to 
do better background rejection, and in "dose slicing" the extra 
information is about radiation damage. We lose that information when we 
use longer exposures per image, and if you burn up the entire useful 
life of your crystal in one shot, then all information about how the 
spots decayed during the exposure is lost. Your data are also rather 
incomplete.


How much information is lost? Well, how much more disk space would be 
taken up, even after compression, if you collected only 1 photon per 
image?  And kept collecting all the way out to 30 MGy in dose? That's 
about 1 million photons (images) per cubic micron of crystal.  So, I'd 
say the amount of information lost is "quite a bit".


But what makes matters worse is that if you did collect this data set 
and preserved all information available from your crystal you'd have no 
way to process it. This is not because its impossible, its just that we 
don't have the software. Your only choice would be to go find images 
with the same "phi" value and add them together until you have enough 
photons/pixel to index it. Once you've got an indexing solution you can 
map every photon hit to a position in reciprocal space as well as give 
it a time/dose stamp. What do you do with that?  You can do zero-dose 
extrapolation, of course! Damage-free data! Wouldn't that be nice. Or 
can you?  The data you will have in hand for each reciprocal-space pixel 
might look something like:
tic tic .. tic . tic ... tic tictic ... 
tictic.


So. Eight photons.  With time-of-arrival information.  How do you fit a 
straight line to that?  You could "bin" the data or do some kind of 
smoothing thing, but then you are losing information again. Perhaps also 
making ill-founded assumptions. You need error bars of some kind, and, 
better yet, the shape of the distribution implied by those error bars.


And all this makes me think somebody must have already done this. I'm 
willing to bet probably some time in the late 1700s to early 1800s. All 
we're really talking about here is augmenting maximum-likelihood 
estimation of an average value to maximum-likelihood estimation of a 
straight line. That is, slope and intercept, with sigmas on both. I 
suspect the proper approach is to first bring everything down to the 
exact information content of a single photon (or lack of a photon), and 
build up from there.  If you are lucky enough to have a large number of 
photons then linear regression will work, and you are back to Diederichs 
(2003). But when you're photon-starved the statistics of single photons 
become more and more important.  This led me to: is it k? or k+1 ?  When 
k=0 getting this wrong could introduce a factor of infinity.


So, perhaps the big "consequence of getting it wrong" is embarrassing 
myself by re-making a 200-year old mistake I am not currently aware of. 
I am confident a solution exists, but only recently started working on 
this.  So, I figured ... ask the world?


-James Holton
MAD Scientist


On 10/17/2021 1:51 AM, Frank Von Delft wrote:
James, I've been watching the thread with fascination, but also the 
confusion of wild ignorance. I've finally realised why.


What I've missed is: what exactly makes the question so important?  
I've understood what brought it up, if course, but not the consequence 
of getting it wrong.


Frank

Sent from tiny silly touch screen
--------
*From:* James Holton 
*Sent:* Saturday, 16 October 2021 20:01
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] am I doing this right?

Thank you everyone for your thoughtful and thought-provoking responses!

But, I am starting to think I was not as clear as I could have been
about my question.  I am actually concerning myself with background, not
necessarily Bragg peaks.  With Bragg photons you want the sum, but for
background you want the average.

What I'm getting at is: how does one properly weight a zero-photon
observation when it comes time to combine it with others? Hopefully
they are not all zero.  If they are, check your shutter.

So, ignoring Bragg photons for the moment (let us suppose it is a
systematic absence) what I am asking is: what is the variance, or,
better yet,what is the WEIGHT one should assign to the observation of
zero photons in a patch of 10x10 pixels?

In the absence of any prior knowledge this is a difficult question, but
a question we kind of need to answer if we want to properly measure data
from weak images.  So, what do we do?

Well, with the "I have no idea" uniform 

Re: [ccp4bb] am I doing this right?

2021-10-16 Thread James Holton

Thank you everyone for your thoughtful and thought-provoking responses!

But, I am starting to think I was not as clear as I could have been 
about my question.  I am actually concerning myself with background, not 
necessarily Bragg peaks.  With Bragg photons you want the sum, but for 
background you want the average.


What I'm getting at is: how does one properly weight a zero-photon 
observation when it comes time to combine it with others?  Hopefully 
they are not all zero.  If they are, check your shutter.


So, ignoring Bragg photons for the moment (let us suppose it is a 
systematic absence) what I am asking is: what is the variance, or, 
better yet,what is the WEIGHT one should assign to the observation of 
zero photons in a patch of 10x10 pixels?


In the absence of any prior knowledge this is a difficult question, but 
a question we kind of need to answer if we want to properly measure data 
from weak images.  So, what do we do?


Well, with the "I have no idea" uniform prior, it would seem that 
expectation (Epix) and variance (Vpix) would be k+1 = 1 for each pixel, 
and therefore the sum of Epix and Vpix over the 100 independent pixels is:


Epatch=Vpatch=100 photons

I know that seems weird to assume 100 photons should have hit when we 
actually saw none, but consider what that zero-photon count, all by 
itself, is really telling you:
a) Epix > 20 ? No way. That is "right out". Given we know its Poisson 
distributed, and that background is flat, it is VERY unlikely you have E 
that big when you saw zero. Cross all those E values off your list.
b) Epix=0 ? Well, that CAN be true, but other things are possible and 
all of them are E>0. So, most likely E is not 0, but at least a little 
bit higher.

c) Epix=1e-6 ?  Yeah, sure, why not?
d) Epix= -1e-6 ?  No. Don't be silly.
e) If I had to guess? Meh. 1 photon per pixel?  That would be k+1

I suppose my objection to E=V=0 is because V=0 implies infinite 
confidence in the value of E, and that we don't have. Yes, it is true 
that we are quite confident in the fact that we did not see any photons 
this time, but the remember that E and V are the mean and variance that 
you would see if you did a million experiments under the same 
conditions. We are trying to guess those from what we've got. Just 
because you've seen zero a hundred times doesn't mean the 101st 
experiment won't give you a count.  If it does, then maybe Epatch=0.01 
and Epix=0.0001?  But what do you do before you see your first photon? 
All you can really do is bracket it.


But what if you come up with a better prior than "I have no idea" ? 
Well, we do have other pixels on the detector, and presuming the 
background is flat, or at least smooth, maybe the average counts/pixel 
is a better prior?


So, let us consider an ideal detector with 1e6 independent pixels. Let 
us further say that 1e5 background photons have hit that detector.  I 
want to still ignore Bragg photons because those have a very different 
prior distribution to the background.  Let us say we have masked off all 
the Bragg areas.


The average overall background is then 0.1 photons/pixel. Let us assign 
that to the prior probability Ppix = 0.1.  Now let us look again at that 
patch of 10x10 pixels with zero counts on it.  We expected to see 10, 
but got 0.  What are the odds of that?  Pretty remote.  Less than 1 in a 
million.


I suspect in this situation where such an unlikely event has occurred it 
should perhaps be given a variance larger than 100. Perhaps quite a bit 
larger?  Subsequent "sigma-weighted" summation would then squash its 
contribution down to effectively 0. So, relative to any other 
observation with even a shred of merit it would have no impact. Giving 
it V=0, however? That can't be right.


But what if Ppix=0.01 ?  Then we expect to see zero counts on our 
100-pixel patch about 1/3 of the time. Same for 1-photon observations. 
Giving these two kinds of observations the same weight seems more 
sensible, given the prior.


Another prior might be to take the flux and sample thickness into 
account.  Given the cross section of light elements the expected 
photons/pixel on most any detector would be:


Ppix = 1.2e-5*flux*exposure*thickness*omega/Npixels
where:
Ppix = expected photons/pixel
Npixels = number of pixels on the detector
omega  = fraction of scattered photons that hit it (about 0.5)
thickness = thickness of sample and loop in microns
exposure = exposure time in seconds
flux = incident beam flux in photons/s
1.2e-5 = 1e-4 cm/um * 1.2 g/cm^3 * 0.2 cm^2/g (cross section of oxygen)

If you don't know anything else about the sample, you can at least know 
that.


Or am I missing something?

-James Holton
MAD Scientist


On 10/16/2021 12:47 AM, Kay Diederichs wrote:

Dear Gergely,

with " 10 x 10 patch of pixels ", I believe James means that he observes 100 
neighbouring pixels each with 0 counts. Thus the frequentist view can be tak

Re: [ccp4bb] am I doing this right?

2021-10-15 Thread James Holton

Well I'll be...

Kay Diederichs pointed out to me off-list that the k+1 expectation and 
variance from observing k photons is in "Bayesian Reasoning in Data 
Analysis: A Critical Introduction" by Giulio D. Agostini. Granted, that 
is with a uniform prior, which I take as the Bayesean equivalent of "I 
have no idea".


So, if I'm looking to integrate a 10 x 10 patch of pixels on a weak 
detector image, and I find that area has zero counts, what variance 
shall I put on that observation?  Is it:


a) zero
b) 1.0
c) 100

Wish I could say there are no wrong answers, but I think at least two of 
those are incorrect,


-James Holton
MAD Scientist

On 10/13/2021 2:34 PM, Filipe Maia wrote:
I forgot to add probably the most important. James is correct, the 
expected value of u, the true mean, given a single observation k is 
indeed k+1 and k+1 is also the mean square error of using k+1 as the 
estimator of the true mean.


Cheers,
Filipe

On Wed, 13 Oct 2021 at 23:17, Filipe Maia <mailto:fil...@xray.bmc.uu.se>> wrote:


Hi,

The maximum likelihood estimator for a Poisson distributed
variable is equal to the mean of the observations. In the case of
a single observation, it will be equal to that observation. As
Graeme suggested, you can calculate the probability mass function
for a given observation with different Poisson parameters (i.e.
true means) and see that function peaks when the parameter matches
the observation.

The root mean squared error of the estimation of the true mean
from a single observation k seems to be sqrt(k+2). Or to put it in
another way, mean squared error, that is the expected value of
(k-u)**2, for an observation k and a true mean u, is equal to k+2.

You can see some example calculations at

https://colab.research.google.com/drive/1eoaNrDqaPnP-4FTGiNZxMllP7SFHkQuS?usp=sharing

<https://colab.research.google.com/drive/1eoaNrDqaPnP-4FTGiNZxMllP7SFHkQuS?usp=sharing>

Cheers,
Filipe

On Wed, 13 Oct 2021 at 17:14, Winter, Graeme (DLSLtd,RAL,LSCI)
<6a19cead4548-dmarc-requ...@jiscmail.ac.uk
<mailto:6a19cead4548-dmarc-requ...@jiscmail.ac.uk>> wrote:

This rang a bell to me last night, and I think you can derive
this from first principles

If you assume an observation of N counts, you can calculate
the probability of such an observation for a given Poisson
rate constant X. If you then integrate over all possible value
of X to work out the central value of the rate constant which
is most likely to result in an observation of N I think you
get X = N+1

I think it is the kind of calculation you can perform on a
napkin, if memory serves

All the best Graeme


On 13 Oct 2021, at 16:10, Andrew Leslie - MRC LMB
mailto:and...@mrc-lmb.cam.ac.uk>>
wrote:

Hi Ian, James,

                      I have a strong feeling that I have
seen this result before, and it was due to Andy Hammersley at
ESRF. I’ve done a literature search and there is a paper
relating to errors in analysis of counting statistics (se
below), but I had a quick look at this and could not find the
(N+1) correction, so it must have been somewhere else. I Have
cc’d Andy on this Email (hoping that this Email address from
2016 still works) and maybe he can throw more light on this.
What I remember at the time I saw this was the simplicity of
the correction.

Cheers,

Andrew


Reducing bias in the analysis of counting statistics data


  Hammersley, AP
  
<https://www.webofscience.com/wos/author/record/2665675>(Hammersley,
  AP) Antoniadis, A
  
<https://www.webofscience.com/wos/author/record/13070551>(Antoniadis,
  A)


  NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION
  A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT


  Volume


  394


  Issue

1-2


  Page

219-224


  DOI

10.1016/S0168-9002(97)00668-2


  Published

JUL 11 1997


On 12 Oct 2021, at 18:55, Ian Tickle mailto:ianj...@gmail.com>> wrote:


Hi James

What the Poisson distribution tells you is that if the true
count is N then the expectation and variance are also N. 
That's not the same thing as saying that for an observed
count N the expectation and variance are N.  Consider all
those cases where the observed count is exactly zero.  That
can arise from any number of true counts, though as you
noted larger values become increasingly unlikely.  However
those true counts are all >= 0 which means that the mean and

[ccp4bb] am I doing this right?

2021-10-12 Thread James Holton
All my life I have believed that if you're counting photons then the 
error of observing N counts is sqrt(N).  However, a calculation I just 
performed suggests its actually sqrt(N+1).


My purpose here is to understand the weak-image limit of data 
processing. Question is: for a given pixel, if one photon is all you 
got, what do you "know"?


I simulated millions of 1-second experiments. For each I used a "true" 
beam intensity (Itrue) between 0.001 and 20 photons/s. That is, for 
Itrue= 0.001 the average over a very long exposure would be 1 photon 
every 1000 seconds or so. For a 1-second exposure the observed count (N) 
is almost always zero. About 1 in 1000 of them will see one photon, and 
roughly 1 in a million will get N=2. I do 10,000 such experiments and 
put the results into a pile.  I then repeat with Itrue=0.002, 
Itrue=0.003, etc. All the way up to Itrue = 20. At Itrue > 20 I never 
see N=1, not even in 1e7 experiments. With Itrue=0, I also see no N=1 
events.
Now I go through my pile of results and extract those with N=1, and 
count up the number of times a given Itrue produced such an event. The 
histogram of Itrue values in this subset is itself Poisson, but with 
mean = 2 ! If I similarly count up events where 2 and only 2 photons 
were seen, the mean Itrue is 3. And if I look at only zero-count events 
the mean and standard deviation is unity.


Does that mean the error of observing N counts is really sqrt(N+1) ?

I admit that this little exercise assumes that the distribution of Itrue 
is uniform between 0.001 and 20, but given that one photon has been 
observed Itrue values outside this range are highly unlikely. The 
Itrue=0.001 and N=1 events are only a tiny fraction of the whole.  So, I 
wold say that even if the prior distribution is not uniform, it is 
certainly bracketed. Now, Itrue=0 is possible if the shutter didn't 
open, but if the rest of the detector pixels have N=~1, doesn't this 
affect the prior distribution of Itrue on our pixel of interest?


Of course, two or more photons are better than one, but these days with 
small crystals and big detectors N=1 is no longer a trivial situation.  
I look forward to hearing your take on this.  And no, this is not a trick.


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] ALS Colloquium Seminar - starting now!

2021-09-29 Thread James Holton

Talk on laser plasma x-ray light sources starting in 5 min!

Here's the zoom link. Open to all.

https://lbnl.zoom.us/j/95338327083



On 9/29/2021 12:24 PM, James Holton wrote:

Greetings all,

This year I have the honor of co-organizing the ALS Colloquium Seminar 
series. This is normally a local but broad-audience forum for users of 
the Advanced Light Source to learn more about what other users are 
doing. But, now that its gone virtual you can all join in and listen 
to what I think will be some very interesting talks. Maybe even ask a 
question? Its not easy finding speakers that appeal to a broad range 
of scientist ranging from biologists to gravel monkeys and chemists, 
but what we have in common is we all need light.  Today's speaker at 
3pm California time is LBL's own Jeroen van Tilborg talking about his 
Laser Plasma Accelerators. Yes, those are a real thing, and 
potentially could be ultra-compact light sources in the future. He 
will tell you about the latest developments.

Zoom links on this page:
https://als.lbl.gov/news-events/seminars/

Coming soon are many other exciting speakers I am proud to have 
brought to this series, including two of my users:
Jennifer Dounda, will be on at 3pm Oct 27, 2021, talking about her 
Nobel Prize winning research on CRISPR.
Erica Saphire will be on at 3pm Dec 8, 2021, talking about bringing 
antibody technology from all over the world together to fight SARS-CoV-2.


I think we've got a pretty good lineup!

-James Holton
MAD Scientist





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] ALS Colloquium Seminar today! - in 2.5 hours

2021-09-29 Thread James Holton

Greetings all,

This year I have the honor of co-organizing the ALS Colloquium Seminar 
series. This is normally a local but broad-audience forum for users of 
the Advanced Light Source to learn more about what other users are 
doing. But, now that its gone virtual you can all join in and listen to 
what I think will be some very interesting talks. Maybe even ask a 
question? Its not easy finding speakers that appeal to a broad range of 
scientist ranging from biologists to gravel monkeys and chemists, but 
what we have in common is we all need light.  Today's speaker at 3pm 
California time is LBL's own Jeroen van Tilborg talking about his Laser 
Plasma Accelerators. Yes, those are a real thing, and potentially could 
be ultra-compact light sources in the future. He will tell you about the 
latest developments.

Zoom links on this page:
https://als.lbl.gov/news-events/seminars/

Coming soon are many other exciting speakers I am proud to have brought 
to this series, including two of my users:
Jennifer Dounda, will be on at 3pm Oct 27, 2021, talking about her Nobel 
Prize winning research on CRISPR.
Erica Saphire will be on at 3pm Dec 8, 2021, talking about bringing 
antibody technology from all over the world together to fight SARS-CoV-2.


I think we've got a pretty good lineup!

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AW: [ccp4bb] Antwort: Re: [ccp4bb] chain on 2-fold axis?

2021-09-01 Thread James Holton

Clearly, the code can change without anyone knowing!

For this reason, and others, I try to remember to do it manually and to 
check.  In cases of pseudo-symmetry, for example P21 with a 90.2 deg 
beta angle and high tNCS, the software may or may not decide it should 
pick free flags in the P222 setting, but it probably "should".


To do it manually, you want to re-index to the higher space group using 
"reindex"

Then pick your free flags
Use "cad" with "outlim" feature to symmetry-expand the data
Use "cad" again to change the space group back
One final run through "cad" to clean things up and make sure you have 
the right asymmetric unit.


That's what I do, anyway,

-James Holton
MAD Scientist

On 8/31/2021 3:04 AM, Eleanor Dodson wrote:

Espec. Jan & Kay,

I had better check this out - ville wrote the code I believe, and I 
have never actually checked the distribution!

Cheers Eleanor

On Tue, 31 Aug 2021 at 08:01, Jan Dohnalek <mailto:dohnalek...@gmail.com>> wrote:


This is good to know indeed.
Will improve my teaching now, I also did not know this is now done
automatically. Thanks for pointing it out.

Jan


On Mon, Aug 30, 2021 at 9:44 PM Kay Diederichs
mailto:kay.diederi...@uni-konstanz.de>> wrote:

Dear Eleanor,

Thanks for pointing out that CCP4 FreeRflag selects the test
set in the highest possible symmetry for the crystal class! I
didn't know that.

The following sentences (which are somewhat difficult to
understand for me) in
https://www.ccp4.ac.uk/html/freerflag.html
<https://www.ccp4.ac.uk/html/freerflag.html> appear to
document that:
"The FreeR_flag is randomly and uniformly distributed
reflexion-by-reflexion, but, additionally, if the keyword
NOSYM is not set, all reflections that are equivalent by the
symmetry of the point group of the twin lattice (assuming the
data is twinned), obtain the same flag. This includes both the
possibility of merohedral and pseudomerohedral twinning. In
the latter case, the obliquity parameter can be set using the
keyword OBL."

I wonder since which CCP4 version (or date) this is the
default behaviour.

best wishes,
Kay

On Mon, 30 Aug 2021 18:28:23 +0100, Eleanor Dodson
mailto:eleanor.dod...@york.ac.uk>>
wrote:

>Back to FreeR factors - Phenix, and I believe FreeRflag now
select FreeRs
>in the highest possible symmetry for the crystal class - eh
P6/mmm for a
>trigonal crystal, and expand the set to fill the actual space
group. This
>means the Free R assignment is suitable if later the crystal
symmetry is
>reassigned. But this was not always done in the past so if
you are trying
>to reuse free/work assignments from an old project there are
possibilities
>of not getting this. Maybe the best solution is to just
generate a new Free
>R set ?
>Eleanor
>
...



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

This message was issued to members of
www.jiscmail.ac.uk/CCP4BB <http://www.jiscmail.ac.uk/CCP4BB>,
a mailing list hosted by www.jiscmail.ac.uk
<http://www.jiscmail.ac.uk>, terms & conditions are available
at https://www.jiscmail.ac.uk/policyandsecurity/
<https://www.jiscmail.ac.uk/policyandsecurity/>



-- 
Jan Dohnalek, Ph.D

Institute of Biotechnology
Academy of Sciences of the Czech Republic
Biocev
Prumyslova 595
252 50 Vestec near Prague
Czech Republic

Tel. +420 325 873 758



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Some questions on tools for CCP4i2cloud: pdbset, pdbcur, coordconv, sftools

2021-08-30 Thread James Holton
I use pdbset, coordconv and sftools a LOT.  Usually for things no other 
programs do, like converting to fractional coordinates, and performing 
math operations on mtz data. Always from command line scripts.


-James Holton
MAD Scientist

On 8/26/2021 3:29 AM, Robbie Joosten wrote:

Dear CCP4 users,

We (as in, the CCP4 developers) are investigating some (potentially) missing 
functionality in CCP4i2 and/or Cloud with respect to the programs pdbset, 
pdbcur, coordconv, and sftools. Some of these tools are quite old and may need 
to be replaced by other tools with similar functionality. Could you answer a 
few questions:

- Do you use any of these tools?
- If so, how often? (Few times a week, month, year, or less than once a year).
- Which functionality of program X do you use?
- Would you like a graphical interface to that functionality or are you happy 
to use the command line?

Personal example:
I use pdbset a few times a month, but only the "noise" function. I don't need a graphical 
interface for it (because it is used in the context of pdb-redo). I also use sftools, "reduce 
-> merge average" a few times a year. Again, only from the command line.


Feel free to send your answers directly to me or to the bulletin board if you 
want to start a discussion. Tips on alternative CCP4 tools to achieve similar 
effects are probably also interesting for other BB users.

Cheers,
Robbie


  




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] pdb2ins "cannot open self"

2021-06-28 Thread James Holton

Greetings all,

I hope everyone had a nice weekend.

I'm also writing to report what might be a bug in the version of pdb2ins 
distributed with CCP4.  On some (but not all) of my linux systems I'm 
getting this error message from pdb2ins at random.  If I run the same 
command ~10 times in a row, then once or twice it actually works.

The rest of the runs I get error messages like this:

Cannot open self /home/programs/ccp4-7.1/bin/pdb2ins\326\376\177  or
archive /home/programs/ccp4-7.1/bin/pdb2ins\326\376\177 .pkg

Where I'm showing the octal codes "\326\376\177" instead of the raw 
binary characters that actually appear on my screen.  The last byte is 
always \177, but the first two are essentially random number generators.


This is happening only on my newest systems with Xeon W or Xeon Platinum 
processors running CentOS 7.9, but not on older boxes.


Any help or advice is much appreciated.

Thank you for maintaining pdb2ins !

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Looking for proteins for undergraduate biochemistry lab

2021-06-18 Thread James Holton
What about proteins where the crystals don't diffract well?  I've found 
in my teaching experience that students take your advice much more 
seriously when they see it work on 3.5A data than they do when they see 
it work on 1.5 A data.


-James Holton
MAD Scientist

On 6/17/2021 7:40 AM, Tanner, John J. wrote:


We developed a senior undergraduate biochemistry lab around an acid 
phosphatase from Francisella tularensis (FtHAP).


https://pubmed.ncbi.nlm.nih.gov/27980518/ 
<https://pubmed.ncbi.nlm.nih.gov/27980518/>


The enzyme is easy to purify and crystallize, and the crystals 
diffract well. L-tartrate and phosphate ion are inexpensive 
inhibitors, which can be included in crystallization (3IT0/1). We have 
the students grow crystals and collect X-ray diffraction data 
remotely. Then they view maps in Coot. The enzyme assay is a simple 
colorimetric test with p-nitrophenylphosphate as the substrate.


--

John J. Tanner

Professor of Biochemistry and Chemistry

Associate Chair of Biochemistry

Department of Biochemistry

University of Missouri
117 Schweitzer Hall

503 S College Avenue
Columbia, MO 65211
Phone: 573-884-1280

Email: tanne...@missouri.edu <mailto:tanne...@missouri.edu>
https://cafnrfaculty.missouri.edu/tannerlab/ 
<https://cafnrfaculty.missouri.edu/tannerlab/>


Lab: Schlundt Annex rooms 3,6,9, 203B, 203C

Office: Schlundt Annex 203A

*From: *CCP4 bulletin board  on behalf of P. H 


*Date: *Wednesday, June 16, 2021 at 5:19 PM
*To: *CCP4BB@JISCMAIL.AC.UK 
*Subject: *[ccp4bb] Looking for proteins for undergraduate 
biochemistry lab


*WARNING:*This message has originated from an External Source. This 
may be a phishing expedition that can result in unauthorized access to 
our IT System. Please use proper judgment and caution when opening 
attachments, clicking links, or responding to this email.


Hello All,

We are looking for some candidate proteins for an undergraduate level 
advanced biochemistry lab. They should be expressed in bacteria, 
simple enough to purify and it will be nice to perform some simple 
characterization experiments(binding assays, enzymatic assays).


Any suggestions?

Thank you in advance.

Prerna gupta



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1=04%7C01%7C%7Cd8b39471bfe34f36491308d93114d507%7Ce3fefdbef7e9401ba51a355e01b05a89%7C0%7C0%7C637594787813433934%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=Lqfa7oQfLmca246Xyr4FzEUz1srrP53XVqWSh%2B9vSng%3D=0> 






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Problem with peakmax

2021-06-10 Thread James Holton


I'm afraid the only bash command I know is: "tcsh"

However, in addition to suggestions so far, I expect this will work:

echo "" | peakmax MAPIN "myfile.ccp4" XYZOUT "myfiles_omit_atom.pdb"

Your peakmax is waiting for options to come in on "standard input". The 
vertical bar "|" is a "pipe" that takes the "standard output" of the 
first command (echo ""), and re-directs it to the standard input of the 
next command.  Pipes are a very powerful tool.  Save a lot of 
cutting-and-pasting.  Fortunately, both bash and tcsh recognize "|" in 
the same way. Unfortunately, they differ on how to re-direct output to a 
file.


Oh, and that magic little "#!" that you put at the start of the first 
line of the script is called a "shebang".


-James Holton
MAD Scientist

On 6/8/2021 11:06 AM, Mohamed Ibrahim wrote:

Thanks Paul,
Is it possible to add Ctrl-D to my bash script?

Best regards,

On Tue, Jun 8, 2021 at 8:03 PM Mohamed Ibrahim 
mailto:mohamed.ibra...@hu-berlin.de>> 
wrote:


Thanks Paul,
Is it possible to add Ctrl-D to my bash script?

Best regards,

On Tue, Jun 8, 2021 at 7:51 PM Paul Emsley
mailto:pems...@mrc-lmb.cam.ac.uk>> wrote:

On Tue, 2021-06-08 at 19:34 +0200, Mohamed Ibrahim wrote:
>
> I am trying to get the peak values from omit maps. The
peakmax from the GUI works fine. However, I tried to run
> peakmax from the terminal, and it stuck when returning P1.
Same map file works fine, when I use the GUI. Any ideas,
> how to solve this problem?
>
> command used
> peakmax MAPIN "myfile.ccp4" XYZOUT "myfiles_omit_atom.pdb"
>

Ctrl-D



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

This message was issued to members of
www.jiscmail.ac.uk/CCP4BB <http://www.jiscmail.ac.uk/CCP4BB>,
a mailing list hosted by www.jiscmail.ac.uk
<http://www.jiscmail.ac.uk>, terms & conditions are available
at https://www.jiscmail.ac.uk/policyandsecurity/
<https://www.jiscmail.ac.uk/policyandsecurity/>



-- 
​

/*
​
--*/
/*Dr. Mohamed Ibrahim */
/*Postdoctoral Researcher
*//**//*
*/
/*Humboldt University
*/
/*Berlin, Germany
*/
/*Tel: +49 30 209347931*/



--
​
/*
​
--*/
/*Dr. Mohamed Ibrahim */
/*Postdoctoral Researcher
*//**//*
*/
/*Humboldt University
*/
/*Berlin, Germany
*/
/*Tel: +49 30 209347931
*/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] How can I refine the occupancy with Refmac?

2021-06-07 Thread James Holton

Certainly Elenaor.  Here are a few:

# typical lysozyme
% refmac_occupancy_setup.com 193l.pdb
occupancy refine
occupancy group id 1 chain A residue 1 alt A
occupancy group id 2 chain A residue 1 alt B
occupancy group id 3 chain A residue 59 alt A
occupancy group id 4 chain A residue 59 alt B
occupancy group id 5 chain A residue 86 alt A
occupancy group id 6 chain A residue 86 alt B
occupancy group id 7 chain A residue 109 alt A
occupancy group id 8 chain A residue 109 alt B
occupancy group alts complete  1 2
occupancy group alts complete  3 4
occupancy group alts complete  5 6
occupancy group alts complete  7 8

#
# another small RT structure

refmac_occupancy_setup.com 1aho.pdb
occupancy refine
occupancy group id 1 chain A residue 9 alt A
occupancy group id 2 chain A residue 9 alt B
occupancy group id 3 chain A residue 12 alt A
occupancy group id 4 chain A residue 12 alt B
occupancy group id 5 chain A residue 24 alt A
occupancy group id 6 chain A residue 24 alt B
occupancy group id 7 chain A residue 63 alt A
occupancy group id 8 chain A residue 63 alt B
occupancy group alts complete  1 2
occupancy group alts complete  3 4
occupancy group alts complete  5 6
occupancy group alts complete  7 8

#
# just the sodium in lysozyme
% egrep "NA|LYS" 2hu3.pdb >! temp.pdb
% refmac_occupancy_setup.com temp.pdb allhet
occupancy refine
occupancy group id 1 residue 1 chain A alt A
occupancy group id 2 residue 1 chain A alt B
occupancy group id 3 residue 33 chain A alt A
occupancy group id 4 residue 33 chain A alt B
occupancy group id 5 residue 97 chain A alt A
occupancy group id 6 residue 97 chain A alt B
occupancy group id 7 residue 9001 chain A
occupancy group id 8 residue 9002 chain A
occupancy group id 9 residue 9003 chain A
occupancy group id 10 residue 9004 chain A
occupancy group id 11 residue 9005 chain A
occupancy group alts complete  5 6
occupancy group alts complete  1 2
occupancy group alts complete  3 4
occupancy group alts incomplete 7
occupancy group alts incomplete 8
occupancy group alts incomplete 9
occupancy group alts incomplete 10
occupancy group alts incomplete 11

#--
Where, for this last one I demonstrate how you can select for various 
things by making a scratch PDB file that contains only the atoms you 
want to occupancy-refine.


Hope this helps!

Cheers,

-James Holton
MAD Scientist


On 6/7/2021 11:16 AM, Eleanor Dodson wrote:
James - could you send me a few examples to add to the documentation? 
Lockdown means I cant easily access my own examples - all trapped on 
the lab desktop..

Eleanor

On Mon, 7 Jun 2021 at 17:18, James Holton <mailto:jmhol...@lbl.gov>> wrote:


I wrote a script for auto-generating occupancy refinement
relationships
for refmac. It is perhaps not as sophisticated as what phenix does
internally, but it gets common things right, like if you have a
2-headed
side chains or partial-occupancy metals.

https://bl831.als.lbl.gov/~jamesh/scripts/refmac_occupancy_setup.com
<https://bl831.als.lbl.gov/~jamesh/scripts/refmac_occupancy_setup.com>

It is a csh/awk shell script. You run it with the name of your
input pdb
file on the command line and it dumps the refmac keywords to a file
called refmac_opts_occ.txt and to the terminal.  If you want every
"hetatm" atom to be refined put "allhet" as a second command-line
argument.

You can then perhaps copy-and-paste this to the relevant GUI window.

-James Holton
MAD Scientist

On 6/7/2021 3:35 AM, Marina Gárdonyi wrote:
> Hi,
>
> I didn't know that I can also enter keywords without a file!
That is a
> good note, thank you!!
>
> Best regards,
> Marina
>
> Zitat von Jon Cooper mailto:jon.b.coo...@protonmail.com>>:
>
>> Hello, the keywords can be entered in refmac gui (in one of the
>> dropdown things), so you don't need a file, as such, but it's
useful
>> to keep a record.
>>
>> Sent from ProtonMail mobile
>>
>>  Original Message 
>> On 5 Jun 2021, 16:39, Marina Gárdonyi wrote:
>>
>>> Hello everyone,
>>>
>>> I am trying to refine structures with Refmac.
>>>
>>> The problem is that nobody from my working group is familiar
with this
>>> programm. They are using Phenix exclusively.
>>>
>>> That's why I need your help. I would like to refine the
occupancy. I
>>> know that I need a keyword file for this, but I have no idea
how to
>>> create such a keyword file.
>>>
>>> Can someone maybe send me a sample file? I think that would
help me. I
>>> have found essential keywords, but I don't know how to build

Re: [ccp4bb] How can I refine the occupancy with Refmac?

2021-06-07 Thread James Holton
I wrote a script for auto-generating occupancy refinement relationships 
for refmac. It is perhaps not as sophisticated as what phenix does 
internally, but it gets common things right, like if you have a 2-headed 
side chains or partial-occupancy metals.


https://bl831.als.lbl.gov/~jamesh/scripts/refmac_occupancy_setup.com

It is a csh/awk shell script. You run it with the name of your input pdb 
file on the command line and it dumps the refmac keywords to a file 
called refmac_opts_occ.txt and to the terminal.  If you want every 
"hetatm" atom to be refined put "allhet" as a second command-line argument.


You can then perhaps copy-and-paste this to the relevant GUI window.

-James Holton
MAD Scientist

On 6/7/2021 3:35 AM, Marina Gárdonyi wrote:

Hi,

I didn't know that I can also enter keywords without a file! That is a 
good note, thank you!!


Best regards,
Marina

Zitat von Jon Cooper :

Hello, the keywords can be entered in refmac gui (in one of the 
dropdown things), so you don't need a file, as such, but it's useful 
to keep a record.


Sent from ProtonMail mobile

 Original Message 
On 5 Jun 2021, 16:39, Marina Gárdonyi wrote:


Hello everyone,

I am trying to refine structures with Refmac.

The problem is that nobody from my working group is familiar with this
programm. They are using Phenix exclusively.

That's why I need your help. I would like to refine the occupancy. I
know that I need a keyword file for this, but I have no idea how to
create such a keyword file.

Can someone maybe send me a sample file? I think that would help me. I
have found essential keywords, but I don't know how to build up such a
file.

Thank you very much in advance!

Best regards,
Marina

--
Marina Gárdonyi

PhD Student, Research Group Professor Dr. Klebe

Department of Pharmaceutical Chemistry

Philipps-University Marburg

Marbacher Weg 6, 35032 Marburg, Germany

Phone: +49 6421 28 21392

E-Mail: marina@pharmazie.uni-marburg.de

http://www.agklebe.de/

 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/








To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] (R)MS

2021-05-28 Thread James Holton
I feel I should point out here that B-factors are NOT a measure of 
uncertainty.  They are a width.  This width itself may be uncertain, as 
may be the position of the center of the peak, but just because your 
peak is broad doesn't mean you don't know where the middle of it is.


As for why leave the mean variation squared?  I expect it is because it 
is supposed to be proportional to temperature. Hence the name 
"temperature factor".


-James Holton
MAD Scientist

On 5/27/2021 11:09 AM, Gergely Katona wrote:


Dear Jonathan,

In 1D sd may be intuitive, but in 3D it is not so much. The square 
root of a symmetric covariance matrix is not universally defined and 
it is not intuitive to me.


Best wishes,

Gergely

Gergely Katona, Professor, Chairman of the Chemistry Program Council

Department of Chemistry and Molecular Biology, University of Gothenburg

Box 462, 40530 Göteborg, Sweden

Tel: +46-31-786-3959 / M: +46-70-912-3309 / Fax: +46-31-786-3910

Web: http://katonalab.eu, Email: gergely.kat...@gu.se

*From:*CCP4 bulletin board  *On Behalf Of 
*Hughes, Jonathan

*Sent:* 27 May, 2021 18:53
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* [ccp4bb] AW: [ccp4bb] (R)MS

hey!

thank y'all for the informative (and swift!) answers! but, if the B 
factor (as defined) appears in a mathematical formulation, that 
doesn't make it an "appropriate" parameter for mobility/uncertainty. 
wouldn't √B be better, in the same way that, for humans, standard 
deviation (RMSD) is a more appropriate parameter of variability than 
variance? or am i missing something?


cheers

j

*Von:*Ian Tickle mailto:ianj...@gmail.com>>
*Gesendet:* Donnerstag, 27. Mai 2021 18:32
*An:* Hughes, Jonathan <mailto:jon.hug...@bot3.bio.uni-giessen.de>>

*Cc:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
*Betreff:* Re: [ccp4bb] (R)MS

Hi Jonathan

It's historical, it's simply how it appears in the expression for the 
Debye-Waller factor, i.e. exp(-B sin^2(theta)/lambda^2).  So it must 
have the same units as lambda^2.


Cheers

-- Ian

On Thu, 27 May 2021 at 13:25, Hughes, Jonathan 
<mailto:jon.hug...@bot3.bio.uni-giessen.de>> wrote:


o yes! but maybe the crystal people could explain to me why the B
factor is the variance (with units of Ų) rather than the standard
deviation (i.e. RMS, with units of Å) when, to my simple mind, the
latter would seem be the more appropriate description of
variability in space?

cheers

jon

*Von:*CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> *Im Auftrag von *Pearce, N.M. (Nick)
*Gesendet:* Donnerstag, 27. Mai 2021 12:38
*A**n:*CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
*Betreff:* Re: [ccp4bb] Analysis of NMR ensembles

If you want something comparable to B-factors don’t forget to put
the MSF in the B-factor column, not the _R_MSF. Will change the
scaling of the tube radius considerably!

Nick

On 27 May 2021, at 11:16, Harry Powell - CCP4BB
<193323b1e616-dmarc-requ...@jiscmail.ac.uk
<mailto:193323b1e616-dmarc-requ...@jiscmail.ac.uk>> wrote:

Cool…

Purely for visualisation this does look like the approved CCP4
way -



Harry

On 27 May 2021, at 10:01, Stuart McNicholas
<19a0c5f649e5-dmarc-requ...@jiscmail.ac.uk
<mailto:19a0c5f649e5-dmarc-requ...@jiscmail.ac.uk>> wrote:

Drawing style (right menu in display table) -> Worm scaled
by -> Worm
scaled by NMR variability

in ccp4mg?

This changes the size of the worm but not the colour.

On Thu, 27 May 2021 at 09:56, Harry Powell - CCP4BB
<193323b1e616-dmarc-requ...@jiscmail.ac.uk
<mailto:193323b1e616-dmarc-requ...@jiscmail.ac.uk>> wrote:


Anyway, thanks to all those who answered my original
question - especially

       Tristan: Chimerax (+ his attached script)
       Michal, Scott: Theseus
(https://theobald.brandeis.edu/theseus/
<https://theobald.brandeis.edu/theseus/>)
       Bernhard: Molmol
(https://pubmed.ncbi.nlm.nih.gov/8744573/
<https://pubmed.ncbi.nlm.nih.gov/8744573/>)
       Rasmus CYRANGE
(http://www.bpc.uni-frankfurt.de/cyrange.html
<http://www.bpc.uni-frankfurt.de/cyrange.html>) and
https://www.ccpn.ac.uk/ <https://www.ccpn.ac.uk/>(of
course…)
       Andrew (uwmn - not sure if this is buildable on
a modern box)
       Smita: PyMol (not sure if I’m allowed to say
that on ccp4bb…)

or I could script it and use Gesamt or Su

Re: [ccp4bb] The oldest science

2021-03-31 Thread James Holton

Shiny!

On 3/31/2021 10:36 AM, David Schuller wrote:


https://www.haaretz.com/archaeology/.premium-somebody-in-the-kalahari-had-a-crystal-collection-105-000-years-ago-1.9670735


  Somebody in the Kalahari Had a Crystal Collection 105,000 Years Ago


--
===
All Things Serve the Beam
===
David J. Schuller
modern man in a post-modern world
MacCHESS, Cornell University
schul...@cornell.edu



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 








To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-11 Thread James Holton
Well, that problem was solved a long time ago.  An excellent 
function-from-sequence predictor is here:

https://blast.ncbi.nlm.nih.gov/Blast.cgi

AlphaFold2 is doing rather much the same thing.  Just with a 3D output 
rather than 1D, and an underlying model with a LOT more fittable parameters.


-James Holton
MAD Scientist

On 12/11/2020 4:42 AM, Phil Evans wrote:

Alpha-fold looks great and is clearly a long way towards answering the question 
“this is the sequence, what is the structure?”

But I’ve always thought the more interesting question is “this is the 
structure, what does it do?”  Is there any progress on that question?

Phil



On 11 Dec 2020, at 12:12, Tristan Croll  wrote:

I'm not Randy, but I do have an answer: like this. This is T1049-D1. AlphaFold 
prediction in red, experimental structure (6y4f) in green. Agreement is close 
to perfect, apart from the C-terminal tail which is way off - but clearly 
flexible and only resolved in this conformation in the crystal due to packing 
interactions. GDT_TS is 93.1; RMS_CA is 3.68 - but if you exclude those tail 
residues, it's 0.79. With an alignment cutoff of 1 A, you can align 109 of 134 
CAs with an RMSD of 0.46 A.
From: CCP4 bulletin board  on behalf of Leonid Sazanov 

Sent: 11 December 2020 10:36
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less 
pipetting (?)
  
Dear Randy,


Can you comment on why for some of AplhaFold2 models with GDT_TS > 90 (supposedly 
as good as experimental model) the RMS_CA (backbone) is > 3.0 Angstrom? Such a 
deviation can hardly be described as good as experimental. Could it be that GDT_TS is 
kind of designed to evaluate how well the general sub-domain level fold is predicted, 
rather than overall detail?

Thanks,
Leonid


Several people have mentioned lack of peer review as a reason to doubt the 
significance of the AlphaFold2 results.  There are different routes to peer 
review and, while the results have not been published in a peer review journal, 
I would have to say (as someone who has been an assessor for two CASPs, as well 
as having editorial responsibilities for a peer-reviewed journal), the peer 
review at CASP is much more rigorous than the peer review that most papers 
undergo.  The targets are selected from structures that have recently been 
solved but not published or disseminated, and even just tweeting a C-alpha 
trace is probably enough to get a target cancelled.  In some cases (as we’ve 
heard here) the people determining the structure are overly optimistic about 
when their structure solution will be finished, so even they may not know the 
structure at the time it is predicted.  The assessors are blinded to the 
identities of the predictors, and they carry out months of calculations and 
inspections of the models, computing ranking scores before they find out who 
made the predictions.  Most assessors try to bring something new to the 
assessment, because the criteria should get more stringent as the predictions 
get better, and they have new ideas of what to look for, but there’s always 
some overlap with “traditional” measures such as GDT-TS, GDT-HA (more stringent 
high-accuracy version of GDT) and lDDT.



Of course we’d all like to know the details of how AlphaFold2 works, and the 
DeepMind people could have been (and should be) much more forthcoming, but 
their results are real.  They didn’t have any way of cheating, being selective 
about what they reported, or gaming the system in any other way that the other 
groups couldn’t do.  (And yes, when we learned that DeepMind was behind the 
exceptionally good results two years ago at CASP13, we made the same half-jokes 
about whether Gmail had been in the database they were mining!)



Best wishes,



Randy Read



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmai

[ccp4bb] when does non-isomorphism become a habit?

2020-12-08 Thread James Holton
I have a semantics question, and I know how much this forum loves 
discussing semantics.


We've all experienced non-isomorphism, where two crystals, perhaps even 
grown from the same drop, yield different data. Different enough so that 
merging them makes your overall data quality worse. I'd say that is a 
fairly reasonable definition of non-isomorphism? Most of the time unit 
cell changes are telling, but in the end it is the I/sigma and 
resolution limit that we care about the most.


Now, of course, even for non-isomorphous data sets you can usually 
"solve" the non-isomorphous data without actually doing molecular 
replacement.  All you usually need to do is run pointless using the PDB 
file from the first crystal as a reference, and it will re-index the 
data to match the model.  Then you just do a few cycles of rigid body 
and you're off and running.  A nice side-effect of this is that all your 
PDB files will line up when you load them into coot.  No worries about 
indexing ambiguities, space group assignment, or origin choice. Phaser 
is a great program, but you don't have to run it on everything.


My question is: what about when you DO have to run Phaser to solve that 
other crystal from the same drop?  What if the space group is the same, 
the unit cell is kinda-sorta the same, but the coordinates have moved 
enough so as to be outside the radius of convergence of rigid-body 
refinement?  Does that qualify as a different "crystal form" or 
different "crystal habit"?  Or is it the same form, and just really 
non-isomorphous?


Opinions?

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] SARS-CoV-2 test on a smartphone

2020-12-07 Thread James Holton
o 
detect the RNA in my kitchen.


The make-or-break here comes down to the at-home fluorometer.  Key 
question is: how many dye molecules can be detected by a device costing 
$1 (+ smartphone) made of parts that can scale to 1 billion units? I've 
looked into this a bit. One very important thing I have learned is that 
Schott glass sucks. I had hoped that by using colored glass filters I 
could avoid the need for any lenses. The "flash" LED on a smartphone is 
very bright, but also highly divergent. The theoretically best 
combination of filters and dyes to use are a Hoya B-370 excitation 
filter, ATTO-465 dye and and Scott OG-515 emission filter. However, the 
OG-515 glass is itself fluorescent.  Degree of fluorescence varies from 
batch to batch, but it is bright enough to see by eye, and therefore 
useless. Maybe I can make some jewelry out of it.


Since colored glass filters are out, that leaves interference filters. 
These need parallel light, and that means lenses.  I have found, 
however, that ball lenses could do the trick. Spheres are easy to 
manufacture.  A ~1 cm ball lens held in contact with the window of the 
"flash" LED of a smartphone renders the light parallel enough for an 
interference filter to work.  A second ball lens then focuses the 
filtered blue light onto a ~1 mm wide sample ~2.5 mm from the ball's 
surface. A third ball lens after the sample picks up the fluorescent 
light and parallelizes it through the emission filter.  A final, forth 
ball lens focuses the fluorescent photons into the smartphone camera. 
Now, scientific-grade ball lenses and interference filters are not the 
cheapest optical components, but then again, neither is anything else 
when you buy it from a scientific supply company.  A 1 cm N-BK7 glass 
ball lens set me back $44, but I also got a bag of one thousand 3/8" 
acrylic ball bearings for $10. Both lens types work equally well in my 
hands. I'm still learning about how interference filters are 
manufactured, but all they really are is a glass plate with some coating 
on it.  Lots of places can do optical coatings, I think.  A billion test 
kits will require a total of 1 km^2, but it doesn't have to ever be all 
one sheet.


Then you need something to hold the optics together.  My favorite right 
now is black rubber hose. It is light tight, the matte finish minimizes 
specular reflections, and with a little pressure the rubber forms good 
light-tight seals all by itself. You can also align the optics 
peristaltically. What's been a little difficult is finding the right way 
to cut a rubber tube and get a nice, smooth edge. Freezing in liquid 
nitrogen would do it, but I don't have any of that in my kitchen. 
Ordinary tubing cutters are OK, but not great. Anyone got a favorite 
trick for this?


And in general, any suggestions or comments, or best of all home 
experiment results would be great to hear.  I believe that if we 
collectively work the problem we will inspire more breakthroughs like 
the Fozouni et al. paper below, and that will have a strong positive 
impact on all of us.


-James Holton
MAD Scientist


On 12/5/2020 8:05 AM, Eugene Osipov wrote:

Hi everyone,
I wanted to revive this discussion with a fresh paper from Cell 
journal: https://www.cell.com/cell/fulltext/S0092-8674(20)31623-8 
<https://www.cell.com/cell/fulltext/S0092-8674(20)31623-8>
So one could use a smartphone camera for SARS-CoV-2 detection but you 
still need some extra tools, like 488 nm laser.


пт, 10 апр. 2020 г. в 20:14, James Holton <mailto:jmhol...@lbl.gov>>:



It looks to me that in this norovirus test the phone is acting as
nothing more than a camara attached to a conventional microscope. 
Light source is 3rd party, and the microscope body is 3D printed. 
3D printing is cool and all, but it does not scale well. 
Antibodies are also expensive to make.  You will go through a lot
of rabbits to make the 1 kg needed for a billion tests. This isn't
quite the price point I had in mind.

I agree that agglomeration of fluorescent beads is very
sensitive.  However, my experience with beads and other small
objects is that they love to stick together for all kinds of
reasons. And once they do it is hard to get them to separate. 
Assaying for virus particles in otherwise pure water is one thing,
it is quite another when there is other stuff around.

Personally, I've tried several different phone-based microscopes
and the hardest thing about them is aligning the camera.  I'm a
beamline scientist, so aligning things is second nature, but your
average person might have a hard time. The most annoying part is
if you bump it you have to start over.  Image quality is also
never all that great, I expect because the optics of a smartphone
camera are wide-angle, and you are fighting against that. 
Eventually I bought a self-contained wifi microscope for $50, an

Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-04 Thread James Holton
Run it for more cycles.  Doesn't take long to drift far enough for it to 
not find its way back when you turn x-ray back on.


This isn't just a problem in refmac, or phenix, or x-plor, or even MD 
programs like AMBER. The problem is that in order to make a structure 
fit into density you have to distort the geometry.  Turn the geometry 
weight up too high and your R factors blow up.  Turn the X-ray weight up 
too high and you get badly distorted geometry. I think we've all 
experienced that?


-James Holton
MAD Scientist

On 12/3/2020 8:29 PM, Jon Cooper wrote:
Hello James, that's really strange - I've used refmac et al., to do 
poor man's energy minimizations of models and they've generally come 
out fine, unless the restraints, etc, are wildly off-target. I wasn't 
playing with X-ray weights though, since there never was a dataset, of 
course.


Cheers, Jon.C.

Sent from ProtonMail mobile



 Original Message 
On 4 Dec 2020, 01:34, James Holton < jmhol...@lbl.gov> wrote:


It is a major leap forward for structure prediction for sure.  A
hearty congratulations to all those teams over all those years.

The part I don't understand is the accuracy.  If we understand
what holds molecules together so well, then why is it that when I
refine an X-ray structure and turn the X-ray weight term down to
zero ... the molecule blows up in my face?

-James Holton
MAD Scientist


On 12/3/2020 3:17 AM, Isabel Garcia-Saez wrote:

Dear all,

Just commenting that after the stunning performance of AlphaFold
that uses AI from Google maybe some of us we could dedicate
ourselves to the noble art of gardening, baking, doing Chinese
Calligraphy, enjoying the clouds pass or everything together
(just in case I have already prepared my subscription to Netflix).

https://www.nature.com/articles/d41586-020-03348-4
<https://www.nature.com/articles/d41586-020-03348-4>

Well, I suppose that we still have the structures of complexes
(at the moment). I am wondering how the labs will have access to
this technology in the future (would it be for free coming from
the company DeepMind - Google?). It seems that they have already
published some code. Well, exciting times.

Cheers,

Isabel


Isabel Garcia-SaezPhD
Institut de Biologie Structurale
Viral Infection and Cancer Group (VIC)-Cell Division Team
71, Avenue des Martyrs
CS 10090
38044 Grenoble Cedex 9
France
Tel.: 00 33 (0) 457 42 86 15
e-mail: isabel.gar...@ibs.fr <mailto:isabel.gar...@ibs.fr>
FAX: 00 33 (0) 476 50 18 90
http://www.ibs.fr/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-03 Thread James Holton
It is a major leap forward for structure prediction for sure.  A hearty 
congratulations to all those teams over all those years.


The part I don't understand is the accuracy.  If we understand what 
holds molecules together so well, then why is it that when I refine an 
X-ray structure and turn the X-ray weight term down to zero ... the 
molecule blows up in my face?


-James Holton
MAD Scientist


On 12/3/2020 3:17 AM, Isabel Garcia-Saez wrote:

Dear all,

Just commenting that after the stunning performance of AlphaFold that 
uses AI from Google maybe some of us we could dedicate ourselves to 
the noble art of gardening, baking, doing Chinese Calligraphy, 
enjoying the clouds pass or everything together (just in case I have 
already prepared my subscription to Netflix).


https://www.nature.com/articles/d41586-020-03348-4 
<https://www.nature.com/articles/d41586-020-03348-4>


Well, I suppose that we still have the structures of complexes (at the 
moment). I am wondering how the labs will have access to this 
technology in the future (would it be for free coming from the company 
DeepMind - Google?). It seems that they have already published some 
code. Well, exciting times.


Cheers,

Isabel


Isabel Garcia-SaezPhD
Institut de Biologie Structurale
Viral Infection and Cancer Group (VIC)-Cell Division Team
71, Avenue des Martyrs
CS 10090
38044 Grenoble Cedex 9
France
Tel.: 00 33 (0) 457 42 86 15
e-mail: isabel.gar...@ibs.fr <mailto:isabel.gar...@ibs.fr>
FAX: 00 33 (0) 476 50 18 90
http://www.ibs.fr/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Contouring Patterson map?

2020-10-10 Thread James Holton

I use mapslicer

-James Holton
MAD Scientist

On 10/8/2020 2:21 PM, Gloria Borgstahl wrote:

What is the best way to display Harker sections... these days?



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] virtual ACA meeting - Ronglie Award presentation

2020-08-06 Thread James Holton

Hello all!

For those who may have missed it, the ACA meeting this year has gone 
virtual:

https://www.acameeting.com/
Much cheaper to register than ever before, and there are a few days left!

Yesterday morning, I had the honor of receiving the ACA's Ronglie Award, 
and I am sharing my presentation here:

https://www.dropbox.com/s/x8zirb4p4f3zfnp/ACA_Rognlie_2020.mp4
alternate location and slides:
https://bl831.als.lbl.gov/~jamesh/powerpoint/ACA_Rognlie_2020.mp4
https://bl831.als.lbl.gov/~jamesh/powerpoint/ACA_Rognlie_2020.pptx

My title is "If I had a trillion dollars ...", and I encourage everyone 
to contemplate the same fortune every now and then. I think our 
community is stronger when we think outside normal financial 
constraints, and it can be fun!


In the last 10 minutes I outline a design for an inexpensive and 
scalable SARS-CoV-2 test I have been thinking about, and I'd like to ask 
the members of this community to contribute to it. Goal is to keep the 
per-test price below $1 and scale to 7 billion tests/day. Do you see any 
show-stoppers? Has anyone ever tried doing fluorescence detection 
without optics?  How about in your own kitchen?


I think if we work together this might actually work.  Please do let me 
know if I missed anything!


-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] real real-space-refinement

2020-07-31 Thread James Holton


Not in CCP4, no. And, technically, not in Phenix either.  The real-space 
refinement in Phenix simply picks peaks in the density and then pulls 
nearby atoms toward them.  Like a black hole gobbling up nearby planets. 
It took me a while to realize that! If you manage to turn off geometry 
restraints (as I eventually did) all the atoms end up on top of each 
other. Might seem like a horrible idea, but for poor resolution data and 
reasonably good geometry restraints it has a high radius of convergence 
and is incredibly fast when compared to "real real-space refinement".  
Refining against map voxels directly is a very very slow process.


But, if you real really want to do real real-space, then I suppose coot 
is doing that?  I'm actually not sure.


The isolde suggestion already made is an excellent one.  The hardest 
part of that is getting the right version of chimeraX working.  But, 
once you've done that its pretty straightforward.


One program that has not been mentioned, but does "real real" space 
refinement is: "rsref"

https://chapman.missouri.edu/wp-content/uploads/sites/2/software/rsref/html/rsref_doc.html

It is not too hard to install and use. I can't say I've gotten results 
appreciably different from reciprocal-space refinement, and that led me 
to ask myself why exactly I thought it would be different.  The Fourier 
transform is symmetric after all.  But I do expect that if you have 
unmodeled regions, such as big, spiky metals, or large tracts of 
disordered, ropy stuff, then localizing the refinement could be beneficial.


Now, of course, you can also do localized refinement in reciprocal space 
by just smoothing out parts of the map that are "uninteresting". The 
vast area of noise around the protein in a cryoEM map, for example, is 
perhaps a candidate for noise suppression. The only trick is how to 
suppress noise without creating systematic error. For example, if your 
model does not have "bulk solvent" then this area will be modeled as 
vacuum, but if you simply set the map voxel values to 0.00, you will 
have effectively created more bulk solvent, not eliminated it.  This is 
because 0.00 is usually the average voxel value, not the "vacuum level". 
Then there is the "edge" between the modified and unmodified areas. 
Unless you smooth it in some way this edge will be very sharp and 
therefore have significant Fourier coefficients at a wide range of 
resolutions.  So, if you are not careful, your "noise suppression" can 
create a lot more error than it eliminates.


As for what to do?  The scale factor given to the "bulk solvent" model 
is perhaps the best value to use to replace the "bulk" solvent region.  
The bulk solvent mask itself, ranging from 0 to 1, might also be a 
reasonable weighting function for combining your original map with a 
single-valued map.  That is, don't change the protein, but flatten the 
solvent. You can get this map out of refmac using the MSKOUT feature. 
You then smooth it in reciprocal space by applying refmac's best-fit 
solvent B factor using sfall and fft, then finally scale it with 
mapmask. I should admit, however, that I have not tried this in a 
while.  Let me know if it works!


HTH

-James Holton
MAD Scientist

On 7/29/2020 8:20 AM, Schreuder, Herman /DE wrote:


Dear BB,

I would like to do a real real-space-refinement of a protein against a 
cryo-EM map; not the mtz-based Refmac approach. A quick internet 
search produced a lot of Phenix hits, but little ccp4 hits. Does 
somebody know how to do this using ccp4 programs, or has someone a 
Coot script to do this?


Thank you for your help!

Herman




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a question regarding SIMBAD

2020-07-26 Thread James Holton
Yes, I've done this quite a few times since it came out.  Amazing how 
quick it can be to search the whole PDB.  Less than 8 hours on some of 
my boxes.


You probably have to run the simbad-database program?  Used to be you 
had to download morda separately, but now it seems automatic. Only 
problem is you need to be root or some other user who can write to the 
CCP4 installation directory.


# as root:
source /programs/ccp4-7.0/bin/ccp4.setup-csh
simbad-database lattice
simbad-database morda



However, I should admit that now that I try to run it again it seems to 
be broken.  Doesn't seem to be a way to point simbad to the new-style 
copy of the database.  However, if I point it to the one I made in 2018 
it works fine:


simbad-morda ./data.mtz -nproc 448 -morda_db ${CCP4}/simbad-morda-db



But if I run:

simbad-morda ./data.mtz

it crashes after ~90 s with:

SIMBAD EXITING AT...
Traceback (most recent call last):
  File 
"/home/programs/ccp4-7.0/lib/py2/simbad/command_line/simbad_morda.py", 
line 91, in 

    main()
  File 
"/home/programs/ccp4-7.0/lib/py2/simbad/command_line/simbad_morda.py", 
line 66, in main

    solution_found = simbad.command_line._simbad_morda_search(args)
  File 
"/home/programs/ccp4-7.0/lib/py2/simbad/command_line/__init__.py", line 
456, in _simbad_morda_search

    chunk_size=args.chunk_size
  File 
"/home/programs/ccp4-7.0/lib/py2/simbad/rotsearch/amore_search.py", line 
194, in run

    pdb_struct.from_file(dat_model)
  File "/home/programs/ccp4-7.0/lib/py2/simbad/util/pdb_util.py", line 
36, in from_file

    self.assert_hierarchy()
  File "/home/programs/ccp4-7.0/lib/py2/simbad/util/pdb_util.py", line 
57, in assert_hierarchy

    assert len(self.hierarchy.models()) > 0, 'No models found in hierarchy'
AssertionError: No models found in hierarchy

Ronan?

-James Holton
MAD Scientist


On 7/25/2020 3:55 PM, Peat, Tom (Manufacturing, Parkville) wrote:

Hello All,

I would like to run SIMBAD to do a brute force MR on a data set that I 
have (running Contra-Miner didn't come up with any known contaminants 
and running SIMBAD with the Lattice and Contaminants search didn't 
give me anything).
As I understand the documentation, SIMBAD can be run using the MORDA 
database (which I have on my computer).
I've managed to run SIMBAD doing the Lattice and Contaminants search, 
but haven't managed to get it to run using the MORDA database for the 
brute force MR.
As far as I can tell, the servers only run the L & C and not the brute 
force MR.
Does anyone have experience (hopefully positive) doing this brute 
force MR?

cheers, tom

Tom Peat
Proteins Group
Biomedical Program, CSIRO
343 Royal Parade
Parkville, VIC, 3052
+613 9662 7304
+614 57 539 419
tom.p...@csiro.au



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Question about P3, H3 and R3 space groups

2020-07-22 Thread James Holton
It took me a long time to realize that the rhombohedral system is 
actually a form of centering.  Just like C, F, and I, the R lattice type 
has extra translation-related points.  What is unique about rhombohedral 
lattices is that there are two different kinds of "extra" centers.  In 
C, all you have is an extra point in the middle of one face, and for F 
it is all the faces.  For I it is an extra point at the exact center of 
the whole cell, but for R there are two extra points in the middle of 
the cell.


The reason there are two is analogous to how an ellipse has two "foci" 
and a circle only has one.  Rhombohedral is just a distortion of a cubic 
cell: grab opposing corners and pull.  The relationship to hexagonal and 
trigonal is clear if you look at a cube down the corner-to-corner 
diagonal.  It looks like a hexagon.  And as it turns out if you 
re-define the lattice centering operations to be "symmetry operators", 
that are internal to the cell rather than relating different cells, you 
can map rhombohedral onto trigonal. That's where "H" comes from.  Most 
any high-symmetry space group can be mapped onto another by turning 
centering into a translation-only symmetry operator.  However, if you 
want to follow the proper definition of a unit cell, anything that 
relates the whole lattice back onto itself via translation only is not a 
"symmetry operator", it is a whole-cell shift.


For a lecture once, I made some movies showing how the different 
centering types relate. I make movies because 3D concepts are hard to 
show in 2D, and so why not provide some motion to give perspective? Some 
of you may find them useful in teaching.

https://bl831.als.lbl.gov/~jamesh/powerpoint/CSHL_spacegroups_2019.pptx

I find it can be more intuitive to students to go in "reverse": start 
with a cubic lattice, which has only one parameter to think about, then 
introduce centering.  After that rotations, and eventually arrive at P1, 
which has six degrees of freedom. Strange how we consider P1 to be the 
"simplest" unit cell, when to the unindoctrinated it certainly is not.


-James Holton
MAD Scientist


On 7/22/2020 9:50 AM, Ian Tickle wrote:


The original reference for the H cell is the very first edition of 
Int. Tab.:


Hermann, C. (1935).  Internationale Tabellen zur Bestimmung von 
Kristallstrukturen.  Berlin: Gebrueder Borntraeger.


Cheers

-- Ian


On Wed, 22 Jul 2020 at 17:34, Eleanor Dodson 
<176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk 
<mailto:176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk>> wrote:


Well - yes . I am a true devotee of doctored cells to match
something already in existence in a higher symmetry which has
become approximate in some new manifestation! But I hadnt realised
there were official versions of doctoring..
Eleanor

On Wed, 22 Jul 2020 at 16:29, Jeremy Karl Cockcroft
mailto:jeremyk...@gmail.com>> wrote:

Dear Eleanor,
What you say is absolutely spot-on! An H3 cell can be reduced
down to the smaller P3 cell as you pointed out.

However, sometimes it may be useful to use a larger unit cell.
I can't give an example for trigonal space groups or from
protein crystallography, but in a recent paper, I used a unit
cell in C-1 (i.e. a doubled P-1 triclinic cell) as this
related to the C-centred monoclinic cell as exhibited in a
higher temperature phase. I could have used P-1, but I knew
that chemists would see the relationship between the phases
more easily by using an enlarged cell. I have done this sort
of thing many times, e.g. I used F2/d for the low temperature
phase of DI (HI) many years ago instead of C2/c as this
related to the face-centred cubic form. As I am interested in
phase transitions, I  tabulated a range of space-group
settings for enlarged unit cells on my site.

I am not sure that this will make the CCP4 list as I am not
subscribed to it - please feel free to echo it on there.
Best regards,
Jeremy Karl.
***
Dr Jeremy Karl Cockcroft
Department of Chemistry
(University College London)
Christopher Ingold Laboratories
20 Gordon Street
London WC1H 0AJ
+44 (0) 20 7679 1004 (laboratory)
+44 (0) 7981 875 829 (cell/mobile)
j.k.cockcr...@ucl.ac.uk <mailto:j.k.cockcr...@ucl.ac.uk> or
jeremyk...@gmail.com <mailto:jeremyk...@gmail.com>
http://img.chem.ucl.ac.uk/www/cockcroft/homepage.htm
***
6 Wellington Road
Horsham
West Sussex
RH12 1DD
+44 (0) 1403 256946 (home)
***


  

Re: [ccp4bb] Sad News

2020-07-19 Thread James Holton
Ward was the Program Official for all my grants!  My life will not be 
the same without him.  He was always so supportive and helpful with 
advice on how to navigate the sometimes convoluted system that is the 
NIH.  From the Protein Structure Initiative to today his hard work has 
made possible my entire scientific career.


"missed" doesn't seem to cover it. May your rest be a peaceful one, Ward.

-James Holton
MAD Scientist

On 7/18/2020 4:36 AM, Sweet, Robert wrote:

I'm writing to acknowledge the passing of Ward Smith during the weekend of 5 
July. Ward got his PhD with Martha Ludwig at U. of Michigan, and then came to 
UCLA in 1977 to join Dave Eisenberg’s group as a postdoc. During the course of 
things, he met Cheryl Janson, a Paul Boyer postdoc, and they were married in 
1980.  During his time at UCLA Ward became expert in operation of the 
Xuong/Hamlin/Nielsen multi-wire system at UCSD and tutored the UCLA users in 
its use. In 1985 Ward and Cheryl left UCLA and went to Monsanto in St. Louis, 
where Ward worked as a structural biologist. In 1987 they went to Agouron 
Pharmaceuticals in San Diego. And then in 1995 went cross-country to SmithKline 
Beecham (which became Glaxo SmithKline, merging with GlaxoWellcome).

Ward was very involved with getting IMCA set up as a functional facility for 
pharmaceuticals at Argonne as SmithKline's representative. This experience gave 
him significant credibility in synchrotron macromolecular crystallography, and 
in 2003 he joined the GM/CA-CAT beamlines at the APS to help Bob Fischetti and 
others construct that excellent facility. During this time Cheryl worked at 
Shamrock Structures.

Ward moved to the NIH headquarters in 2007. There he took some responsibility 
for the Protein Structure Initiative, also playing an important role in 
supporting NIH synchrotron facilities. In 2010 he became the branch chief for 
the Structural Genomics and Proteomics Technology Branch in the Division of 
Cell Biology and Biophysics.  He remained in that position through 2017. At the 
2018 NIGMS re-organization Ward went to the Biophysics, Biomedical Technology, 
and Computational Biosciences division as the branch chief for the Biomedical 
Technology Branch.

Ward helped oversee the big NIH-funded, $45 M construction of three major 
beamlines at NSLS-II, a project called ABBIX that ran 2011-2017. In 2017 he 
became program director for NIH support of structural biology beamlines at 
NSLS-II and other DOE synchrotrons.

Many knew Ward for his always calm, reasoned demeanor; he was unflappable, 
resilient, and friendly. He was well read and devoted to his family.


   Robert M. Sweet   E-Dress:  sw...@bnl.gov
   Scientific Advisor, CBMS: The Center for BioMolecular
 Structure at NSLS-II
   Photon Sciences and Biology Dept
   Brookhaven Nat'l Lab.
   Upton, NY  11973 U.S.A.
   Phones: 631 344 3401  (Office)
 631 338 7302  (Mobile)



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] flow rate for cooling stream?

2020-07-03 Thread James Holton
To add: I think what Dr. Garman means by "match" is not necessarily to 
set the two flow rates to the same value.  The optimal settings will 
depend on the size, shape, and even orientation of your nozzle.  You 
need to fiddle with the outer stream flow rate to find the one that 
minimizes the turbulence.


Mounting up a large, long-necked nylon loop with a big drop of liquid in 
it makes an excellent vibrometer.  You can often see the loop vibration 
under the video microscope if your frame rate is high enough.  And, of 
course, the most systematic, sensitive and relevant assay is to put a 
crystal in it and collect some data:

https://doi.org/10.1107/S0021889808032536

-James Holton
MAD Scientist

On 7/3/2020 3:09 AM, Elspeth Garman wrote:


Yes, if you don’t match the inner and outer stream rates you get 
turbulent flow at the boundary between them and ice build up on the 
sample.


Best wishes

Elspeth

*From:*CCP4 bulletin board  *On Behalf Of 
*Marcus Winter

*Sent:* 03 July 2020 09:29
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* [ccp4bb] flow rate for cooling stream?

Dear Patrick, Herman,

Regarding the Cryostream, then referring to the Cryostream 800 
brochure indicates that the N2 (gas) flow rates employed are in the 
range of 5 - 10 */litres/* / minute, … and, I guess, therefore 
something similar for the outer dry air 'shield' stream.   Obviously, 
it’s best to check directly with the manufacturers (Oxford 
Cryosystems) directly on the details.


All The Best,

Marcus Winter

(Rigaku)

 -Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
Schreuder, Herman /DE

Sent: Friday, July 03, 2020 7:44 AM
To: CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
Subject: [ccp4bb] AW: [EXTERNAL] [ccp4bb] flow rate for cooling stream?

Dear Patrick,

if I recall correctly, our systems run at 10-15 ml/min (gas). I will 
check on Monday when I am back in the lab.


The original cryostreams would run for several day's on a tank of 
liquid nitrogen. However, they had significant hardware to dry the 
nitrogen and to ensure a constant flow.


Best, Herman

-Ursprüngliche Nachricht-

Von: CCP4 bulletin board <mailto:CCP4BB@JISCMAIL.AC.UK>> Im Auftrag von Patrick Loll


Gesendet: Donnerstag, 2. Juli 2020 22:03

An: CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>

Betreff: [EXTERNAL] [ccp4bb] flow rate for cooling stream?

EXTERNAL : Real sender is owner-ccp...@jiscmail.ac.uk 
<mailto:owner-ccp...@jiscmail.ac.uk>


Sorry, way off topic:

Does anyone have an estimate for the flow rate one would typically use 
for the cold nitrogen stream passing over a protein crystal in a 
standard data collection?


Background: Our nitrogen “generator” has gone belly-up and the vendor 
no longer services it, so I’m testing the feasibility of using the 
boil-off from a liquid nitrogen tank to provide the gas to support a 
short data collection (this nitrogen gas would serve as the feedstock 
into our helium cryostat). But I don’t know the flow rate required, so 
I can’t calculate if one tank has enough nitrogen to support a day or 
so of data collection. There are flow meters for the warm and cold 
stream on the nitrogen generator, but these flow meters have no 
apparent units anywhere on them, so I have no idea of the rate at 
which the gas would be consumed.


Thanks for any useful tidbits.

And for those of you in the US, best wishes for a happy “Holy crap, 
even MORE fireworks?!?!!” Day


Pat

---

Patrick J. Loll, Ph. D.

Professor of Biochemistry & Molecular Biology Drexel University 
College of Medicine Room 10-102 New College Building


245 N. 15th St., Mailstop 497

Philadelphia, PA 19102  USA

(215) 762-7706

pjl...@gmail.com <mailto:pjl...@gmail.com>

pj...@drexel.edu <mailto:pj...@drexel.edu>



To unsubscribe from the CCP4BB list, click the following link:

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_WA-2DJISC.exe-3FSUBED1-3DCCP4BB-26A-3D1=DwIFaQ=Dbf9zoswcQ-CRvvI7VX5j3HvibIuT3ZiarcKl5qtMPo=HK-CY_tL8CLLA93vdywyu3qI70R4H8oHzZyRHMQu1AQ=VneuoA5-6DXNCKIjhZIadysng0TaNK_RNr0BvxjRo4o=_ddn2_elU6gq5ZhiU4pfkyTWAgVI5j3OoezaZiKUAbs=

This message was issued to members of 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.jiscmail.ac.uk_CCP4BB=DwIFaQ=Dbf9zoswcQ-CRvvI7VX5j3HvibIuT3ZiarcKl5qtMPo=HK-CY_tL8CLLA93vdywyu3qI70R4H8oHzZyRHMQu1AQ=VneuoA5-6DXNCKIjhZIadysng0TaNK_RNr0BvxjRo4o=ODhtRaAXr7HIrYoQ7C4ZOc5egkrkE6ZSuZu3yM6I1Uw=, 
a mailing list hosted by 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.jiscmail.ac.uk=DwIFaQ=Dbf9zoswcQ-CRvvI7VX5j3HvibIuT3ZiarcKl5qtMPo=HK-CY_tL8CLLA93vdywyu3qI70R4H8oHzZyRHMQu1AQ=VneuoA5-6DXNCKIjhZIadysng0TaNK_RNr0BvxjRo4o=asWzspAR9776AV0v6FT34hmni7fpk7sBmK7yBcwSIM4=, 
terms & conditions are

Re: [ccp4bb] number of frames to get a full dataset?

2020-07-01 Thread James Holton
ing because you will probably not be corrected for calling a 
"courgette" a "zucchini", especially if you are Italian. However, a 
native Hindi speaker might feel compelled to correct your pronunciation 
of "shampoo".  I am not singling out any one culture here, we have all 
given in to the temptation to "correct" someone, perhaps even while 
visiting their home.  Ahh, the errors of my youth.


All that said, I don't think this forum is the place to discuss cultural 
differences.  This is especially true once we start using words like 
"correct"/"incorrect" and "right"/"wrong", as these tend to generate far 
more heat than light.  However, I do think it important to identify and 
describe cultural differences when they start to impede scientific 
discussion.  It is OK to disagree.  But let it be over interpretation of 
complete information that both parties possess, not preconceived notions 
nor ignorance of the complete picture. If we understand WHY another 
person thinks in a way we find disagreeable, then perhaps we have a 
better chance of moving forward and enjoying the upcoming celebrations 
of Independence/GoodRiddanceUngratefulColonials Day.


Whatever you call it, an eggplant or an an aubergine, its odour/odor and 
flavour/flavor are the same.  I apologize/apologise to my 
neighbours/neighbors across the Lake/Pond for my behaviour/behavior if 
you are not enamoured/enamored with my endeavour/endeavor at 
humor/humour.  It is not my specialty/speciality.  fullstop/period.


-James Holton
MAD Scientist


On 6/29/2020 3:36 PM, Bernhard Rupp wrote:


I think it is time to escalate that discussion to crystallographic 
definition purists like Massimo or to a logical consistency proponent 
like Ian who abhors definitional vacuum 


Cheers, BR

*From:* CCP4 bulletin board  *On Behalf Of 
*Andreas Förster

*Sent:* Monday, June 29, 2020 15:24
*To:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] number of frames to get a full dataset?

I like to think that the reflections I carefully measured at high 
multiplicity are not redundant, which the dictionary on my computer 
defines as "not or no longer needed or useful; superfluous" and the 
American Heritage Dictionary as "exceeding what is necessary or 
natural; superfluous" and "needlessly repetitive; verbose".


Please don't use the term Needless repetitivity in your Table 1.  It 
sends the wrong message.  Multiplicity is good.


All best.

Andreas

On Tue, Jun 30, 2020 at 12:03 AM James Holton <mailto:jmhol...@lbl.gov>> wrote:


I have found that the use of "redundancy" vs "multiplicity"
correlates very well with the speaker's favorite processing
software.  The Denzo/HKL program scalepack outputs "redundancy",
whereas scala/aimless and other more Europe-centric programs
output "multiplicity".

At least it is not as bad as "intensity", which is so ambiguous as
to be almost useless as a word on its own.

-James Holton
MAD Scientist

On 6/24/2020 10:27 AM, Bernhard Rupp wrote:

> Oh, and some of us prefer the word 'multiplicity' ;-0

Hmmm…maybe not. ‘Multiplicity’ in crystallography is context
sensitive, and not uniquely defined. It can refer to

 1. the position multiplicity (number of equivalent sites per
unit cell, aka Wyckoff-Multiplicity), the only (!) cif use
of multiplicity
 2. the multiplicity of the reflection, which means the
superposition of reflections with the same /d/  (mostly
powder diffraction)
 3. the multiplicity of observations, aka redundancy.

While (a) and (b) are clearly defined, (c) is an arbitrary
experimental number.

How from (a) real space symmetry follows (b) in reciprocal
space (including the epsilon zones, another ‘multiplicity’) is
explained here

https://scripts.iucr.org/cgi-bin/paper?a14080

and also on page 306 in BMC.

Too much multiplicity might create duplicity…

Cheers, BR

Jon Cooper

On 23 Jun 2020 22:04, "Peat, Tom (Manufacturing, Parkville)"
mailto:tom.p...@csiro.au>> wrote:

I would just like to point out that for those of us who
have worked too many times with P1 or P21 that even 360
degrees will not give you 'super' anomalous differences.

I'm not a minimalist when it comes to data- redundancy is
a good thing to have.

cheers, tom

Tom Peat
Proteins Group
Biomedical Program, CSIRO
343 Royal Parade
Parkville, VIC, 3052
+613 9662 7304
+614 57 539 419
tom.p...@csiro.au <mailto:tom.p...@csiro.au>


-

Re: [ccp4bb] [EXTERNAL] Re: [ccp4bb] number of frames to get a full dataset?

2020-06-29 Thread James Holton

What could possibly go wrong?

-James Holton
MAD Scientist

On 6/29/2020 6:17 PM, Edward A. Berry wrote:
Now can we get rid of all the superfluous disks in our RAID? Or at 
least not replace them when they fail?


On 06/29/2020 06:24 PM, Andreas Förster wrote:
I like to think that the reflections I carefully measured at high 
multiplicity are not redundant, which the dictionary on my computer 
defines as "not or no longer needed or useful; superfluous" and the 
American Heritage Dictionary as "exceeding what is necessary or 
natural; superfluous" and "needlessly repetitive; verbose".


Please don't use the term Needless repetitivity in your Table 1.  It 
sends the wrong message.  Multiplicity is good.


All best.


Andreas



On Tue, Jun 30, 2020 at 12:03 AM James Holton <mailto:jmhol...@lbl.gov>> wrote:


    I have found that the use of "redundancy" vs "multiplicity" 
correlates very well with the speaker's favorite processing 
software.  The Denzo/HKL program scalepack outputs "redundancy", 
whereas scala/aimless and other more Europe-centric programs output 
"multiplicity".


    At least it is not as bad as "intensity", which is so ambiguous 
as to be almost useless as a word on its own.


    -James Holton
    MAD Scientist

    On 6/24/2020 10:27 AM, Bernhard Rupp wrote:


    > Oh, and some of us prefer the word 'multiplicity' ;-0

    Hmmm…maybe not. ‘Multiplicity’ in crystallography is context 
sensitive, and not uniquely defined. It can refer to 


 1. the position multiplicity (number of equivalent sites per 
unit cell, aka Wyckoff-Multiplicity), the only (!) cif use of 
multiplicity
 2. the multiplicity of the reflection, which means the 
superposition of reflections with the same /d/  (mostly powder 
diffraction) 

 3. the multiplicity of observations, aka redundancy.

    While (a) and (b) are clearly defined, (c) is an arbitrary 
experimental number.


    How from (a) real space symmetry follows (b) in reciprocal space 
(including the epsilon zones, another ‘multiplicity’) is explained 
here 


    https://scripts.iucr.org/cgi-bin/paper?a14080 
<https://urldefense.com/v3/__https://scripts.iucr.org/cgi-bin/paper?a14080__;!!GobTDDpD7A!Z-SrnEqSZwQOXWOwbMCkZ1GB3fvdFuQ5lzYUYwQdUVTCALc3j9O3xqX7-s72_nF7$> 



    and also on page 306 in BMC.

    Too much multiplicity might create duplicity… 

    Cheers, BR

    __ __

    Jon Cooper

    __ __

    On 23 Jun 2020 22:04, "Peat, Tom (Manufacturing, Parkville)" 
mailto:tom.p...@csiro.au>> wrote:


    I would just like to point out that for those of us who have 
worked too many times with P1 or P21 that even 360 degrees will not 
give you 'super' anomalous differences. 


    I'm not a minimalist when it comes to data- redundancy is a 
good thing to have. 


    cheers, tom 

    __ __

    Tom Peat
    Proteins Group
    Biomedical Program, CSIRO
    343 Royal Parade
    Parkville, VIC, 3052
    +613 9662 7304
    +614 57 539 419
    tom.p...@csiro.au <mailto:tom.p...@csiro.au> 

    __ __



--


    *From:*CCP4 bulletin board <mailto:CCP4BB@JISCMAIL.AC.UK>> on behalf of 
0c2488af9525-dmarc-requ...@jiscmail.ac.uk 
<mailto:0c2488af9525-dmarc-requ...@jiscmail.ac.uk> 
<0c2488af9525-dmarc-requ...@jiscmail.ac.uk 
<mailto:0c2488af9525-dmarc-requ...@jiscmail.ac.uk>>

    *Sent:* Wednesday, June 24, 2020 1:10 AM
    *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK> 
mailto:CCP4BB@JISCMAIL.AC.UK>>
    *Subject:* Re: [ccp4bb] number of frames to get a full 
dataset? 


    

    Someone told me there is a cubic space group where you can 
get away with something like 11 degrees of data. It would be 
interesting if that's correct. These minimu

Re: [ccp4bb] number of frames to get a full dataset?

2020-06-29 Thread James Holton
I have found that the use of "redundancy" vs "multiplicity" correlates 
very well with the speaker's favorite processing software.  The 
Denzo/HKL program scalepack outputs "redundancy", whereas scala/aimless 
and other more Europe-centric programs output "multiplicity".


At least it is not as bad as "intensity", which is so ambiguous as to be 
almost useless as a word on its own.


-James Holton
MAD Scientist

On 6/24/2020 10:27 AM, Bernhard Rupp wrote:


> Oh, and some of us prefer the word 'multiplicity' ;-0

Hmmm…maybe not. ‘Multiplicity’ in crystallography is context 
sensitive, and not uniquely defined. It can refer to


 1. the position multiplicity (number of equivalent sites per unit
cell, aka Wyckoff-Multiplicity), the only (!) cif use of multiplicity
 2. the multiplicity of the reflection, which means the superposition
of reflections with the same /d/  (mostly powder diffraction)
 3. the multiplicity of observations, aka redundancy.

While (a) and (b) are clearly defined, (c) is an arbitrary 
experimental number.


How from (a) real space symmetry follows (b) in reciprocal space 
(including the epsilon zones, another ‘multiplicity’) is explained here


https://scripts.iucr.org/cgi-bin/paper?a14080

and also on page 306 in BMC.

Too much multiplicity might create duplicity…

Cheers, BR

Jon Cooper

On 23 Jun 2020 22:04, "Peat, Tom (Manufacturing, Parkville)" 
mailto:tom.p...@csiro.au>> wrote:


I would just like to point out that for those of us who have
worked too many times with P1 or P21 that even 360 degrees will
not give you 'super' anomalous differences.

I'm not a minimalist when it comes to data- redundancy is a good
thing to have.

cheers, tom

Tom Peat
Proteins Group
Biomedical Program, CSIRO
343 Royal Parade
Parkville, VIC, 3052
+613 9662 7304
+614 57 539 419
tom.p...@csiro.au <mailto:tom.p...@csiro.au>



*From:*CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> on behalf of
0c2488af9525-dmarc-requ...@jiscmail.ac.uk
<mailto:0c2488af9525-dmarc-requ...@jiscmail.ac.uk>
<0c2488af9525-dmarc-requ...@jiscmail.ac.uk
<mailto:0c2488af9525-dmarc-requ...@jiscmail.ac.uk>>
*Sent:* Wednesday, June 24, 2020 1:10 AM
*To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
mailto:CCP4BB@JISCMAIL.AC.UK>>
*Subject:* Re: [ccp4bb] number of frames to get a full dataset?

Someone told me there is a cubic space group where you can get
away with something like 11 degrees of data. It would be
interesting if that's correct. These minimum ranges for data
collection rely on the crystal being pre-oriented, which is
unheard-of these days, although they can help if someone is
nagging you to get off the beam line or if your diffraction fades
quickly. Going for 180 degrees always makes sense for a
well-behaved crystal, or 360 degrees if you want super anomalous
differences. Hope this helps a bit.

Jon Cooper

On 23 Jun 2020 07:29, Andreas Förster
mailto:andreas.foers...@dectris.com>> wrote:

Hi Murpholino,

in my opinion (*), the question is neither number of frames
nor degrees.  The only thing that matters to your crystal is
dose.  How many photons does your crystal take before it
dies?  Consequently, the question to ask is How best to use
photons.  Some people have done exactly that.

https://doi.org/10.1107/S2059798319003528


All best.

Andreas

(*) Disclaimer:  I benefit when you use PILATUS or EIGER - but
I want you to use them to your advantage.

On Tue, Jun 23, 2020 at 12:04 AM Murpholino Peligro
mailto:murpholi...@gmail.com>> wrote:

Hi.
Quick question...

I have seen *somewhere* that to get a 'full dataset we
need to collect n frames':

at least 180 frames if symmetry is X

at least 90 frames if symmetry is Y

at least 45 frames if symmetry is Z

Can somebody point where is *somewhere*?

...also...

what other factors can change n... besides symmetry and
radiation damage?

Thanks




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1




-- 


Andreas Förster, Ph.D.

Application Scientist Crystallography, Area Sales Manager Asia
& Pacific

Phone: +41 56 500 21 00| Direct: +41 56 500 21 76| Email:
andreas.foers...@dectris.com <mailto:andreas.foers...@dectris.com>

Re: [ccp4bb] How many microfocus beamlines are in the world?

2020-06-24 Thread James Holton

Define "micro focus" ?

-James Holton
MAD Scientist

On 6/24/2020 9:18 AM, Murpholino Peligro wrote:

I would like to know how many MX beamlines are micro focus?


Thanks.



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Question about small molecule crystallography

2020-06-13 Thread James Holton

It's not all that hard to exceed it with a protein crystal too.

A 50 um wide lysozyme crystal sitting in a 50x50um beam will scatter 
into a single spot up to:


I = 7e-14*flux*(F/mosaic)^2

Where I is in photons/s
flux is incident photons/s
mosaic is in deg
F is the structure factor of the relevant hkl in electrons.

This is the peak photon arrival rate when the hkl is exactly on the 
Ewald sphere.  So, if we have F=130, mosaic=0.02 deg (typical for room 
temp), and flux = 1e12 ph/s we expect a peak count rate of 3e6 ph/s.  If 
that is a 1-pixel spot, then it will exceed the maximum count rate of 
Pilatus2 and Eiger1 detectors (2e6 ph/s).  For lysozyme, 35% of all hkls 
to 2.0A have F > 130.


That said, the "instant retrigger" feature of Pilatus3 and Eiger2 does a 
much better job of correcting for this. Also, spots are usually larger 
than 1 pixel. I often advise room temperature collection with 1 deg 
images on my Pilatus3 because this allows us to run unattenuated. The 
error from the retrigger correction is significantly smaller than the 
error incurred by photons lost in the 1-2 ms gap between images on 
Pilatus.  This second error is all but eliminated by Eiger's much shoter 
read-out period, but only Eiger2 has an instant retrigger feature.


And no, Dectris didn't pay me to say that.

The long and short of it is that whenever you use a counting device your 
intensity data are fundamentally non-linear.  Instant retrigger is just 
one of the various things to try to correct the non-linearity.


How much impact does non-linearity have?  Surprisingly, not that much!  
As long as the non-linearity is uniform across the detector face the 
impact on the more popular data quality metrics is hard to detect if you 
don't know what you're looking for.  It is not hard to test this for 
yourself.  All you need to do is take your favorite dataset's images and 
run the pixels through some non-linear function.  I just tried this with 
a lysozyme dataset using what should be a horrible thing to do: 
new_pixel = 10*sqrt(old_pixel). After doing the same processing and 
refinement protocol both before (normal) and after sqrt-ing the pixel 
values I get:


stat  normal  sqrt-ed
Rwork   17.4  22.0
Rfree   21.7  25.4
CC1/2   99.8  97.6
dmin     1.47  1.55
ISa 15   315
low-res bin:
Rmeas4.7   8.8
I/SIGMA 32.1  32.8
CCano   5936

Ok. ISa is weird, and stats are generally poorer after making the data 
hugely non-linear, but not so poor as to make you suspect something so 
massively wrong with the data.  The anomalous signal is lower, but 
amazingly still there. I suspect this is because anomalous differences 
are relative differences and even wiht a non-linear detector small 
relative differences can still be measured. Food for thought I suppose.


Oh, and read-out noise also doesn't hurt resolution nearly as much as 
you might think.  You can also try this for yourself by adding random 
noise to your pixels, or by simply adding pure background images to your 
data images.  You have to add quite a lot of background before you start 
to notice its impact. This is especially true for poorly-diffracting 
crystals (high WilsonB factor) where the drop in intensity with 
increasing Bragg angle is very steep.  The spots just "shut off" over a 
very narrow range in resolution.  High background can shift the limit 
around in this narrow range, but not by much. Anomalous differences are 
even less sensitive to background than resolution.  This is because the 
"background" for anomalous differences is the spot photons themselves.


You don't believe me, do you?  Try it.  Use merge2cbf to add images 
together. You will find it in your XDS program directory.


-James Holton
MAD Scientist

On 6/8/2020 1:35 PM, Winter, Graeme (DLSLtd,RAL,LSCI) wrote:

Hi Jon

Ambiguous phrasing, perhaps - the detector has a maximum count rate, 
as events per second, and it is easy to exceed this with a good 
quality small molecule crystal on an undulator beamline thus under 
record the intensity of strong reflections


Best wishes Graeme

On 8 Jun 2020, at 20:55, bogba...@yahoo.co.uk 
<mailto:bogba...@yahoo.co.uk> wrote:


Re: "it turns out to be very very easy to exceed the count rate where 
the detector electronics can keep up."


Sorry if this is obvious, but I take it you mean "_can't_" keep up?

Jon Cooper

On 4 Jun 2020 13:06, "Winter, Graeme (DLSLtd,RAL,LSCI)" 
mailto:graeme.win...@diamond.ac.uk>> wrote:


Dear All,

A small word of caution regarding chemical crystallography on an
MX-like beamline - if you have a bright source, a well
diffracting crystal and a pixel array detector it is perfectly
possible to lose counts in the strongest reflections without
noticing - certainly without going over the nominal detector
count limits if your mosaic spread is very small

At Diamond we faced this issue with i19, which 

Re: [ccp4bb] visual mask editor - why

2020-06-13 Thread James Holton

Bernhard,

Sounds like you are plotting something similar to what I was tinkering 
with once.  A script you may find useful is this one:

https://bl831.als.lbl.gov/~jamesh/bin_stuff/map_func.com

I wrote this because although mapmask, maprot, etc have very useful 
functionalities I found I wanted additional features, such as dividing 
one map by another, or taking a square root.  These are important if you 
are trying to derive the "signal-to-noise ratio", for example.


 Once you have a map of rho/sigma(rho) you can convert that into a 
"pobability" by passing it through the "erf()" and "pow()" functions.  
This can be a good way of estimating the "probability something is 
there" or P(rho) for a given map voxel. Specifically:


P(rho) = 1-pow(erf(abs(rho/sigma(rho))/sqrt(2)),V/2/d^3)

where:
rho is the electron density map value (preferably from a Fo-Fc map)
sigma(rho) is the error on that voxel
V is the unit cell volume (A^3)
d is the resolution in A
erf() is called the "error function"
pow(m,e) is the raise-to-a-power function: m^e

The erf() function by itself turns a rho/sigma(rho)=3 peak into 0.997, 
and a 1-sigma peak into 0.683. What that means is: assuming the noise is 
Gaussian, you expect voxels in the range -1-sigma to +1-sigma to be 
~68.3% of the total.  The "probability it isn't noise" (sometimes called 
a "p-value") is then 1-0.683 = 0.32.  Seems like a pretty high 
probability to give to a 1-sigma peak, but now remember that the map is 
not just a single observation but thousands.  So, the question you 
really want to ask is: if I generate 100x100x100 = 1e6 Gaussian-random 
numbers, what are the odds that a 4-sigma peak occurred at random?  The 
answer is: pretty much garanteed.  In any collection of 1 million 
Gaussian-random numbers with rms=1 it is virtually impossible to not 
have at least one of them > 4. Trust me, I have tried. This is where the 
"pow()" function comes in.  You need to multiply all the individual 
voxel probabilities together to get the probability of at least one 
>4-sigma peak happening at random.


But, then again, map voxels are hardly independent observations. Finite 
resolution means that neighboring pixels are highly correlated.  So, 
rather than map grid points, we should be considering "Shannon voxels".  
All this is is the number of blobs of diameter d, where d is the 
resolution, that can fit into the volume of the map. For example, if we 
have a 100 A edge on a cubic cell and 3 A resolution, then we have about 
3.7e4 independent "observations" of density, so the probability of a 
random 4-sigma peak is:

P(rho) = erf(4/sqrt(2))**(((100./3)**3)/2) = 0.31

That is, if you make a zillion maps of random data using different seeds 
each time, 3.7e4 voxels each, and draw from a Gaussian distribution with 
rms=1, you expect 31% of these maps to have a 4-sigma peak. Randomly. 
The other 69% will not have anything > 4.  So, if you see a 4-sigma peak 
in a map with 3.7e4 Shannon voxels I'd say it is real about 69% of the 
time.  You might consider 0.69 to be a decent "weight" you should give 
such a 4-sigma peak.  A 5-sigma peak under the same circumstances gets 
P(rho) = 0.989, and a 3-sigma peak gets P(rho) = 1e-22.  aka: probaby 
noise.  It is perhaps worth remembering that at a given resolution 
large-sigma noise peaks are more common in bigger cells than small ones.


So, how do you get sigma(rho)?  To be honest, the rms value of the 
mFo-DFc map is a pretty good estimate.  The rms value of the 2mFo-DFc 
map is not (Lang et al. PNAS 2013).  I usually get sigma(rho) 
empirically.  That is, by making a stack of maps: start with your 
favorite refined structure and introduce random noise from whatever 
source you want to test. I.E. Gaussian error proprotional to SigI is an 
obvious one.  A more realistic on is the Fo-Fc difference itself.  After 
adding this "extra" noise to the data, re-refine the structure and 
generate a 2mFo-DFc map.  Do this ~50 times with different random number 
seeds. Then take those 50 maps and compute the mean and standard 
deviation of "rho" at every voxel.  You can do this with mapmask's ADD, 
MULT and SCALE features, but you can't do the last step, which is taking 
the square root of the variance.  Hence: map_func.com


There are lots of other functions supported, including random number 
genration, etc.  Run the script with no arguments to get a list.


Oh, but don't try it on an mtz file!  mtz files are not maps.

-James Holton
MAD Scientist


On 5/28/2020 12:11 PM, Bernhard Rupp wrote:


Yes I have already pilfered useful parts of it in the scripts…

Thx, BR

*From:* Boaz Shaanan 
*Sent:* Thursday, May 28, 2020 11:59
*To:* b...@hofkristallamt.org
*Cc:* CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] visual mask editor - why

Hi Bernhard,

Did you consider trying 'polder' in t

  1   2   3   4   5   6   >