Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-10-02 Thread Whitley, Matthew J
Hi Andrew,


Thanks very much for your reply.


I have taken a look at the example data sets you referred me to, and I am most 
interested in example 7, the case with multiple lattices.  I will process this 
one myself and might include it during my tutorials.  This is for the Cold 
Spring Harbor x-ray course which takes place later this month.


Sincerely,

Matthew


---
Matthew J. Whitley, Ph.D.
Research Instructor
Department of Pharmacology & Chemical Biology
University of Pittsburgh School of Medicine



From: Andrew Leslie 
Sent: Thursday, September 27, 2018 5:41 AM
To: Whitley, Matthew J
Cc: ccp4bb
Subject: Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

Dear Matthew,

   I am also late in responding to this, but as part of a 
Nature Protocols paper on iMosflm (Supplementary Information for Nature 
Protocols 12, 1310-1325, 2017) I provided a number of examples of “problem 
datasets”. Some of these are just two images, to show issues in indexing, 
others are complete datasets showing a variety of pathologies.

All the images and a tutorial on how best to process them (with iMosflm) are 
available at the following URL:

www.mrc-lmb.cam.ac.uk/harry/imosflm/examples<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mrc-lmb.cam.ac.uk%2Fharry%2Fimosflm%2Fexamples=02%7C01%7Cmjw100%40PITT.EDU%7Cd146604484cb4dd13f5008d6245d717e%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636736381099125261=EzdbGDpEO126WIYz85Nj9QPvfuDXI31nj14OvyjjHzM%3D=0>


Best wishes,

Andrew


On 26 Sep 2018, at 03:15, Whitley, Matthew J 
mailto:mjw...@pitt.edu>> wrote:

For some reason, the September 19th ccp4bb digest got caught in my spam filter 
and didn't come through until a few minutes ago, so I didn't see several 
responses concerning interesting datasets for processing until just now.

Therefore, thanks also to Kay Diederichs, Eugene Osipov, and David Waterman for 
responding (and also to everyone else who responded if I am still overlooking 
anyone.)

As I mentioned before, I will be happy to compile a list of suggested datasets 
and make it available via this list.

Matthew


---
Matthew J. Whitley, Ph.D.
Research Instructor
Department of Pharmacology & Chemical Biology
University of Pittsburgh School of Medicine



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2Fwebadmin%3FSUBED1%3DCCP4BB%26A%3D1=02%7C01%7Cmjw100%40PITT.EDU%7Cd146604484cb4dd13f5008d6245d717e%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636736381099125261=7Rc8F7hF8ZIejZyicbNjS0wX1Fz2tUCn7iXCp6Su8Fk%3D=0>




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-27 Thread Gloria Borgstahl
Hi Matthew,  I am also a bit late in responding, I have a few
incommensurately modulated protein crystal datasets that you would be
welcome to use in your course.  It would be neat for students to at
least know that this type of diffraction exists.  As far as I know,
they can only be processed with Eval software.  Please let me know if
you are interested and we can transfer images to you.  Thanks, Gloria


On Thu, Sep 27, 2018 at 4:42 AM Andrew Leslie  wrote:
>
> Dear Matthew,
>
>I am also late in responding to this, but as part of a 
> Nature Protocols paper on iMosflm (Supplementary Information for Nature 
> Protocols 12, 1310-1325, 2017) I provided a number of examples of “problem 
> datasets”. Some of these are just two images, to show issues in indexing, 
> others are complete datasets showing a variety of pathologies.
>
> All the images and a tutorial on how best to process them (with iMosflm) are 
> available at the following URL:
>
> www.mrc-lmb.cam.ac.uk/harry/imosflm/examples
>
>
> Best wishes,
>
> Andrew
>
>
> On 26 Sep 2018, at 03:15, Whitley, Matthew J <> wrote:
>
> For some reason, the September 19th ccp4bb digest got caught in my spam 
> filter and didn't come through until a few minutes ago, so I didn't see 
> several responses concerning interesting datasets for processing until just 
> now.
>
> Therefore, thanks also to Kay Diederichs, Eugene Osipov, and David Waterman 
> for responding (and also to everyone else who responded if I am still 
> overlooking anyone.)
>
> As I mentioned before, I will be happy to compile a list of suggested 
> datasets and make it available via this list.
>
> Matthew
>
> ---
> Matthew J. Whitley, Ph.D.
> Research Instructor
> Department of Pharmacology & Chemical Biology
> University of Pittsburgh School of Medicine
>
>
> 
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
>
>
>
> 
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-27 Thread Andrew Leslie
Dear Matthew,

   I am also late in responding to this, but as part of a 
Nature Protocols paper on iMosflm (Supplementary Information for Nature 
Protocols 12, 1310-1325, 2017) I provided a number of examples of “problem 
datasets”. Some of these are just two images, to show issues in indexing, 
others are complete datasets showing a variety of pathologies.

All the images and a tutorial on how best to process them (with iMosflm) are 
available at the following URL:

www.mrc-lmb.cam.ac.uk/harry/imosflm/examples 



Best wishes,

Andrew


> On 26 Sep 2018, at 03:15, Whitley, Matthew J  wrote:
> 
> For some reason, the September 19th ccp4bb digest got caught in my spam 
> filter and didn't come through until a few minutes ago, so I didn't see 
> several responses concerning interesting datasets for processing until just 
> now.
> 
> Therefore, thanks also to Kay Diederichs, Eugene Osipov, and David Waterman 
> for responding (and also to everyone else who responded if I am still 
> overlooking anyone.)
> 
> As I mentioned before, I will be happy to compile a list of suggested 
> datasets and make it available via this list.
> 
> Matthew
> 
>  ---
> Matthew J. Whitley, Ph.D.
> Research Instructor
> Department of Pharmacology & Chemical Biology
> University of Pittsburgh School of Medicine
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1 
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-25 Thread Whitley, Matthew J
For some reason, the September 19th ccp4bb digest got caught in my spam filter 
and didn't come through until a few minutes ago, so I didn't see several 
responses concerning interesting datasets for processing until just now.

Therefore, thanks also to Kay Diederichs, Eugene Osipov, and David Waterman for 
responding (and also to everyone else who responded if I am still overlooking 
anyone.)

As I mentioned before, I will be happy to compile a list of suggested datasets 
and make it available via this list.

Matthew


---
Matthew J. Whitley, Ph.D.
Research Instructor
Department of Pharmacology & Chemical Biology
University of Pittsburgh School of Medicine



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-25 Thread David Waterman
Hi Matthew,

I'm a little late to the thread, but I thought I would still like to add
DPF3b, kindly provided by Wolfram Tempel. This dataset is available on
zenodo and forms the basis of a tutorial for using DIALS:
https://dials.github.io/documentation/tutorials/correcting_poor_initial_geometry_tutorial.html.
As the tutorial states,

This is a challenging dataset to process. There are a combination of
problems, including:

   - A ‘reversed’ rotation axis
   - Incorrect beam centre recorded in the image headers
   - Split spots
   - Multiple lattices
   - Systematically weak spots that may correspond to pseudocentring

Cheers
-- David


On Tue, 25 Sep 2018 at 18:27, Whitley, Matthew J  wrote:

> Dear colleagues,
>
> I want to thank the following people for providing suggestions and
> comments about ‘difficult’ datasets suitable for teaching data processing:
>
> Tim Craig
> Jacob Keller
> Graeme Winter
> Aleksandar Bijelic
> Clemens Vonrhein
> Loes Kroon-Batenburg
> James Holton
>
> If anyone else has suggestions for good datasets for teaching processing,
> I would still be happy to hear them.
>
> Finally, several people asked me to make available a list of all the
> dataset suggestions I receive.  I am happy to do so, and I will post a
> message to this list when the information is up and available, probably
> later in the fall.
>
>
> Sincerely,
> Matthew
>
>
>
> ---
> Matthew J. Whitley, Ph.D.
> Research Instructor
> Department of Pharmacology & Chemical Biology
> University of Pittsburgh School of Medicine
>
>
>
>
> On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
>
> Dear colleagues,
>
> For teaching purposes, I am looking for a small number (< 5) of
> macromolecular diffraction datasets (raw images) that might be
> considered 'difficult' for a beginning crystallography student to
> process.  By 'difficult' I generally mean not able to be processed
> automatically by a common processing package (XDS, Mosflm, DIALS, etc)
> using default settings, i.e., no black box "click and done" processing.
> The datasets I am looking for would have some stumbling block such as
> incorrect experimental parameters recorded in the image headers,
> multiple lattices that cause indexing to fail, datasets for which
> determining the correct space group is tricky, datasets for experiments
> in which the crystal slipped or moved in the beam, or anything else you
> can think of.  The idea is for these beginning students to examine
> several datasets that highlight various phenomena that can lead one
> astray during processing.
>
> A good candidate dataset would also ideally comprise a modest number of
> images so as to keep integration time to a minimum.  Factors that are
> mostly irrelevant for my purpose: resolution (as long as better than
> ~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
> scattering,  presence/absence of ligands, monomeric vs oligomeric
> structures, etc.  Also, to be clear, I am not looking for datasets that
> have so many pathologies that they would require many long hours of work
> for an expert to process correctly.
>
> I have checked public repositories such as proteindiffraction.org and
> SBGrid databank, but all of the datasets I acquired from these sources
> process satisfactorily with little effort, and in any event I know of no
> way to search for 'challenging' datasets.  (I also wonder whether
> anybody is in the habit of depositing, shall we say, less-than-pristine
> images to public repositories?)
>
> If you know of such a dataset that is already publicly available, or if
> you have such a dataset that you are willing to share for solely
> educational purposes, I would appreciate hearing from you, either on- or
> off-list.
>
> Thank you in advance for your suggestions.
>
> Matthew
>
>
>
>
> --
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-25 Thread Whitley, Matthew J
Dear colleagues,

I want to thank the following people for providing suggestions and comments 
about ‘difficult’ datasets suitable for teaching data processing:

Tim Craig
Jacob Keller
Graeme Winter
Aleksandar Bijelic
Clemens Vonrhein
Loes Kroon-Batenburg
James Holton

If anyone else has suggestions for good datasets for teaching processing, I 
would still be happy to hear them.

Finally, several people asked me to make available a list of all the dataset 
suggestions I receive.  I am happy to do so, and I will post a message to this 
list when the information is up and available, probably later in the fall.


Sincerely,
Matthew



---
Matthew J. Whitley, Ph.D.
Research Instructor
Department of Pharmacology & Chemical Biology
University of Pittsburgh School of Medicine




On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as 
proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.

Matthew






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-22 Thread James Holton
It was brought to my attention that the link to the preprint I provided 
below doesn't work, but this one does:


https://www.biorxiv.org/content/early/2018/08/18/394965

Thanks to Folmer Fredslund for pointing this out to me!

-James Holton
MAD Scientist

On 9/21/2018 3:50 PM, James Holton wrote:
For teaching purposes I have found that controlled pairs of data sets 
are most instructive.  You are right that an easy one-button-push 
processing run tells you nothing, but so does a 
bang-it-crashed-now-what data set.  Most useful are two data sets that 
are identical in every respect but one, and that one thing is the 
point you are trying to get across.  It's hard to collect such 
perfectly paired data sets, so I ended up just simulating them. I 
deliberately chose a high-symmetry space group to keep the download 
size small. You can download them from here:


http://bl831.als.lbl.gov/~jamesh/workshop/

These five datasets represent the four biggest problems I see users 
have when trying to solve structures: 1) poor anomalous signal, 2) 
overlaps from a bad crystal orientation, 3) hidden radiation damage to 
sites, and 4) ice rings.  The 5th "goodsignal" dataset is the positive 
control.


The web page contains everything from images to processed MTZ files, 
maps and the "right answer" in pdb and mtz format.  A slightly more 
"realistic" version with a bigger download size is here:


http://bl831.als.lbl.gov/~jamesh/workshop2/

This is the one I used for my "weak anomalous challenge" a few years 
back. The teaching advantage is that you can use the image-mixer 
script to modulate the severity of problems like ice rings and 
anomalous signal.  If you make a competition of it, people tend to get 
more interested.


When it comes to beam centers, it is not all that hard to take a data 
set with a "correct" beam center and just edit the headers. How you do 
this depends on the file format, but I have some instructions for 
editing images in general here:


http://bl831.als.lbl.gov/~jamesh/bin_stuff/

In general, you can usually separate the header from the data with the 
unix command "head" or "dd", edit the header with your favorite text 
editor, and then put the two parts back together with "cat". As for 
which beam center is "correct", it is important to tell your students 
that that depends on which software you are using.  I wrote all this 
down in the last paragraph on page 7 of this doc:


https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965

This doc also describes another simulated data set that demonstrates 
the challenges of combining lots of short wedges together.  May or may 
not be too advanced a topic for your students?  Or maybe not. As you 
can guess I'm experimenting with biorxiv.  So far, no comments.


Good luck with your class!

-James Holton
MAD Scientist


On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:

Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-22 Thread Andreas Förster
Hi James,

you’re probably aware of this but you can edit CBF headers in place with
sed. That’s what I do when I make the detector on my diffractometer go
closer than the hardware limit.

All best - Andreas
On Sat, 22 Sep 2018 at 00:51, James Holton 
wrote:

> For teaching purposes I have found that controlled pairs of data sets
> are most instructive.  You are right that an easy one-button-push
> processing run tells you nothing, but so does a bang-it-crashed-now-what
> data set.  Most useful are two data sets that are identical in every
> respect but one, and that one thing is the point you are trying to get
> across.  It's hard to collect such perfectly paired data sets, so I
> ended up just simulating them. I deliberately chose a high-symmetry
> space group to keep the download size small. You can download them from
> here:
>
> http://bl831.als.lbl.gov/~jamesh/workshop/
>
> These five datasets represent the four biggest problems I see users have
> when trying to solve structures: 1) poor anomalous signal, 2) overlaps
> from a bad crystal orientation, 3) hidden radiation damage to sites, and
> 4) ice rings.  The 5th "goodsignal" dataset is the positive control.
>
> The web page contains everything from images to processed MTZ files,
> maps and the "right answer" in pdb and mtz format.  A slightly more
> "realistic" version with a bigger download size is here:
>
> http://bl831.als.lbl.gov/~jamesh/workshop2/
>
> This is the one I used for my "weak anomalous challenge" a few years
> back. The teaching advantage is that you can use the image-mixer script
> to modulate the severity of problems like ice rings and anomalous
> signal.  If you make a competition of it, people tend to get more
> interested.
>
> When it comes to beam centers, it is not all that hard to take a data
> set with a "correct" beam center and just edit the headers. How you do
> this depends on the file format, but I have some instructions for
> editing images in general here:
>
> http://bl831.als.lbl.gov/~jamesh/bin_stuff/
>
> In general, you can usually separate the header from the data with the
> unix command "head" or "dd", edit the header with your favorite text
> editor, and then put the two parts back together with "cat". As for
> which beam center is "correct", it is important to tell your students
> that that depends on which software you are using.  I wrote all this
> down in the last paragraph on page 7 of this doc:
>
> https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965
>
> This doc also describes another simulated data set that demonstrates the
> challenges of combining lots of short wedges together.  May or may not
> be too advanced a topic for your students?  Or maybe not. As you can
> guess I'm experimenting with biorxiv.  So far, no comments.
>
> Good luck with your class!
>
> -James Holton
> MAD Scientist
>
>
> On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
> > Dear colleagues,
> >
> > For teaching purposes, I am looking for a small number (< 5) of
> > macromolecular diffraction datasets (raw images) that might be
> > considered 'difficult' for a beginning crystallography student to
> > process.  By 'difficult' I generally mean not able to be processed
> > automatically by a common processing package (XDS, Mosflm, DIALS, etc)
> > using default settings, i.e., no black box "click and done" processing.
> > The datasets I am looking for would have some stumbling block such as
> > incorrect experimental parameters recorded in the image headers,
> > multiple lattices that cause indexing to fail, datasets for which
> > determining the correct space group is tricky, datasets for experiments
> > in which the crystal slipped or moved in the beam, or anything else you
> > can think of.  The idea is for these beginning students to examine
> > several datasets that highlight various phenomena that can lead one
> > astray during processing.
> >
> > A good candidate dataset would also ideally comprise a modest number of
> > images so as to keep integration time to a minimum.  Factors that are
> > mostly irrelevant for my purpose: resolution (as long as better than
> > ~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
> > scattering,  presence/absence of ligands, monomeric vs oligomeric
> > structures, etc.  Also, to be clear, I am not looking for datasets that
> > have so many pathologies that they would require many long hours of work
> > for an expert to process correctly.
> >
> > I have checked public repositories such as proteindiffraction.org and
> > SBGrid databank, but all of the datasets I acquired from these sources
> > process satisfactorily with little effort, and in any event I know of no
> > way to search for 'challenging' datasets.  (I also wonder whether
> > anybody is in the habit of depositing, shall we say, less-than-pristine
> > images to public repositories?)
> >
> > If you know of such a dataset that is already publicly available, or if
> > you have such a dataset that you are 

Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-21 Thread James Holton
For teaching purposes I have found that controlled pairs of data sets 
are most instructive.  You are right that an easy one-button-push 
processing run tells you nothing, but so does a bang-it-crashed-now-what 
data set.  Most useful are two data sets that are identical in every 
respect but one, and that one thing is the point you are trying to get 
across.  It's hard to collect such perfectly paired data sets, so I 
ended up just simulating them. I deliberately chose a high-symmetry 
space group to keep the download size small. You can download them from 
here:


http://bl831.als.lbl.gov/~jamesh/workshop/

These five datasets represent the four biggest problems I see users have 
when trying to solve structures: 1) poor anomalous signal, 2) overlaps 
from a bad crystal orientation, 3) hidden radiation damage to sites, and 
4) ice rings.  The 5th "goodsignal" dataset is the positive control.


The web page contains everything from images to processed MTZ files, 
maps and the "right answer" in pdb and mtz format.  A slightly more 
"realistic" version with a bigger download size is here:


http://bl831.als.lbl.gov/~jamesh/workshop2/

This is the one I used for my "weak anomalous challenge" a few years 
back. The teaching advantage is that you can use the image-mixer script 
to modulate the severity of problems like ice rings and anomalous 
signal.  If you make a competition of it, people tend to get more 
interested.


When it comes to beam centers, it is not all that hard to take a data 
set with a "correct" beam center and just edit the headers. How you do 
this depends on the file format, but I have some instructions for 
editing images in general here:


http://bl831.als.lbl.gov/~jamesh/bin_stuff/

In general, you can usually separate the header from the data with the 
unix command "head" or "dd", edit the header with your favorite text 
editor, and then put the two parts back together with "cat". As for 
which beam center is "correct", it is important to tell your students 
that that depends on which software you are using.  I wrote all this 
down in the last paragraph on page 7 of this doc:


https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965

This doc also describes another simulated data set that demonstrates the 
challenges of combining lots of short wedges together.  May or may not 
be too advanced a topic for your students?  Or maybe not. As you can 
guess I'm experimenting with biorxiv.  So far, no comments.


Good luck with your class!

-James Holton
MAD Scientist


On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:

Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.

Matthew





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-20 Thread Loes Kroon-Batenburg

Dear Matthew,

In my search for the validity of meta data, I went through several data 
sets in SBGrid and proteindiffraction.org (IRRMC), especially those 
where automatic processing did not succeed are gave different results 
with different processing software. We reprocessed those data sets with 
EVAL and located some issues with the meta data. Most of the problems 
are related to wrong or absent primary beam positions.
SBGrid is very useful to find problematic data sets, since it lists four 
attempts to re-process the data.


Some interesting cases could be:
SBGrid:
373 automatic processing failed to see the crystal is C (or I)-centered 
monoclinic
254 unit cell no so easy to index and inaccurate (and therefore space 
group wrong) because of presence of a second and third lattice

426 beam center very wrong (has to be manually set)
415 another beam center problem; caused large errors in unit cell 
dimensions in automatic processing (only xia2-3dii has it right)
383 is a very challenging case; might be too complicated; at least two 
lattices amd C-centered orthorombic (which none of the automatic 
processing software found)


IRRMC
3m8t: apparent monoclinic C2 symmetry, but is in fact triclinic. Also a 
second smaller fragment present.

4i08: wrong beam center. Rotation direction may have to be reversed.

Store.Synchrotron:
public data set https://store.synchrotron.org.au/experiment/view/1037/  
: one long unit cell axis: quite a challenge combined with inaccurate 
beam center.


Let me add: several initiatives exist or are on the way to publish and 
describe raw data of which the structure could not be solved due to all 
kinds of reasons: indexing fails, phase problem can not be solved, lots 
of additional reflections present, diffuse scattering.  See progress 
report and references in there 
(http://forums.iucr.org/viewtopic.php?f=21=396) of the IUCr working 
group DDDWG, now continued as COMMDAT committee.


Best wishes,
Loes
On 09/20/18 02:15, Whitley, Matthew J wrote:

Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.

Matthew




--

__

Dr. Loes Kroon-Batenburg
Dept. of Crystal and Structural Chemistry
Bijvoet Center for Biomolecular Research
Utrecht University
Padualaan 8, 3584 CH Utrecht
The Netherlands

E-mail : l.m.j.kroon-batenb...@uu.nl
phone  : +31-30-2532865
fax: +31-30-2533940
__



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-20 Thread Kay Diederichs
Hi Matthew,

I have some notes which indicate that SBgrid data sets 5, 62, 78, 117, 218 
posed problems for "automatic" processing using generate_XDS.INP / XDS when I 
saw them for the first time. 
Some of these problems (mainly the conversion of header values to ORGX ORGY) 
are taken care of in current generate_XDS.INP; others remain (defaults for 
MAXIMUM_ERROR_OF_SPOT_POSITION, MAXIMUM_ERROR_OF_SPINDLE_POSITION too low for 
some data sets) or cannot be fixed (radiation damage or detector defects? in 
data set 68). 
This reflects of course the general progress  - what was difficult some time 
ago may be almost trivial now.

My advice would be to go all the way from data processing to refinement. 
Otherwise you end up with a heap of numbers that might tell you something about 
the precision of the data, but nothing about their accuracy. 
And if you do that, keep in mind that the RCSB-deposited PDBs may not be 
optimal interpretations of the data. Also consider the re-refined PDBs from the 
PDB_REDO website.

best,

Kay



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-19 Thread graeme.win...@diamond.ac.uk
Matthew,

One SBGrid example I used for a workshop was

https://data.sbgrid.org/dataset/218/

This has the “wrong” beam centre as understood by (me/dials/xia2) which causes 
a certain amount of fun, nice example for use of the reciprocal lattice viewer 
and image viewer in DIALS to make sensible choices

Best wishes Graeme




On 20 Sep 2018, at 01:15, Whitley, Matthew J 
mailto:mjw...@pitt.edu>> wrote:

Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as 
proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.

Matthew

--
Matthew J. Whitley, Ph.D.
Department of Pharmacology & Chemical Biology
University of Pittsburgh School of Medicine




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


-- 
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-19 Thread Keller, Jacob
I deposited a dataset on SBGrid from a calmodulin-peptide complex which has 
some "nice" features: merohedral twinning (with variable twin fraction in the 
same dataset) and unavoidable detector cutoffs. It's relatively easy to 
integrate, but solving is somewhat harder. There's a paper on it too which 
gives the "answers" I came up with.

JPK

+
Jacob Pearson Keller
Research Scientist / Looger Lab
HHMI Janelia Research Campus
19700 Helix Dr, Ashburn, VA 20147
Desk: (571)209-4000 x3159
Cell: (301)592-7004
+

The content of this email is confidential and intended for the recipient 
specified in message only. It is strictly forbidden to share any part of this 
message with any third party, without a written consent of the sender. If you 
received this message by mistake, please reply to this message and follow with 
its deletion, so that we can ensure such a mistake does not occur in the future.

-Original Message-
From: CCP4 bulletin board  On Behalf Of Whitley, Matthew 
J
Sent: Wednesday, September 19, 2018 8:16 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of macromolecular 
diffraction datasets (raw images) that might be considered 'difficult' for a 
beginning crystallography student to process.  By 'difficult' I generally mean 
not able to be processed automatically by a common processing package (XDS, 
Mosflm, DIALS, etc) using default settings, i.e., no black box "click and done" 
processing. The datasets I am looking for would have some stumbling block such 
as incorrect experimental parameters recorded in the image headers, multiple 
lattices that cause indexing to fail, datasets for which determining the 
correct space group is tricky, datasets for experiments in which the crystal 
slipped or moved in the beam, or anything else you can think of.  The idea is 
for these beginning students to examine several datasets that highlight various 
phenomena that can lead one astray during processing.

A good candidate dataset would also ideally comprise a modest number of images 
so as to keep integration time to a minimum.  Factors that are mostly 
irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous 
scattering,  presence/absence of ligands, monomeric vs oligomeric structures, 
etc.  Also, to be clear, I am not looking for datasets that have so many 
pathologies that they would require many long hours of work for an expert to 
process correctly.

I have checked public repositories such as proteindiffraction.org and SBGrid 
databank, but all of the datasets I acquired from these sources process 
satisfactorily with little effort, and in any event I know of no way to search 
for 'challenging' datasets.  (I also wonder whether anybody is in the habit of 
depositing, shall we say, less-than-pristine images to public repositories?)

If you know of such a dataset that is already publicly available, or if you 
have such a dataset that you are willing to share for solely educational 
purposes, I would appreciate hearing from you, either on- or off-list.

Thank you in advance for your suggestions.

Matthew

--
Matthew J. Whitley, Ph.D.
Department of Pharmacology & Chemical Biology University of Pittsburgh School 
of Medicine




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1


[ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

2018-09-19 Thread Whitley, Matthew J
Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of 
macromolecular diffraction datasets (raw images) that might be 
considered 'difficult' for a beginning crystallography student to 
process.  By 'difficult' I generally mean not able to be processed 
automatically by a common processing package (XDS, Mosflm, DIALS, etc) 
using default settings, i.e., no black box "click and done" processing.  
The datasets I am looking for would have some stumbling block such as 
incorrect experimental parameters recorded in the image headers, 
multiple lattices that cause indexing to fail, datasets for which 
determining the correct space group is tricky, datasets for experiments 
in which the crystal slipped or moved in the beam, or anything else you 
can think of.  The idea is for these beginning students to examine 
several datasets that highlight various phenomena that can lead one 
astray during processing.

A good candidate dataset would also ideally comprise a modest number of 
images so as to keep integration time to a minimum.  Factors that are 
mostly irrelevant for my purpose: resolution (as long as better than 
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous 
scattering,  presence/absence of ligands, monomeric vs oligomeric 
structures, etc.  Also, to be clear, I am not looking for datasets that 
have so many pathologies that they would require many long hours of work 
for an expert to process correctly.

I have checked public repositories such as proteindiffraction.org and 
SBGrid databank, but all of the datasets I acquired from these sources 
process satisfactorily with little effort, and in any event I know of no 
way to search for 'challenging' datasets.  (I also wonder whether 
anybody is in the habit of depositing, shall we say, less-than-pristine 
images to public repositories?)

If you know of such a dataset that is already publicly available, or if 
you have such a dataset that you are willing to share for solely 
educational purposes, I would appreciate hearing from you, either on- or 
off-list.

Thank you in advance for your suggestions.

Matthew

-- 
Matthew J. Whitley, Ph.D.
Department of Pharmacology & Chemical Biology
University of Pittsburgh School of Medicine




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1