[Jmol-developers] [ jmol-Feature Requests-1656702 ] molecule name shown for PDB

SourceForge.net Tue, 20 Feb 2007 04:53:45 -0800

Feature Requests item #1656702, was opened at 2007-02-10 07:19
Message generated for change (Settings changed) made by hansonr
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=379136&aid=1656702&group_id=23629


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Interface Improvements
Group: None
>Status: Closed
Priority: 5
Private: No
Submitted By: Angel Herraez (aherraez)
Assigned to: Bob Hanson (hansonr)
Summary: molecule name shown for PDB

Initial Comment:
A proposal to retrieve and use some molecule name from PDB files to be shown in 
Jmol title and menu.
Current state:

** With MOL files: 
- Jmol app window title: shows the filename + extension, followed by a dash and 
the contents of line 1 (which usually is the molecule name or description). 
- Both app and applet: show the molecule name (line 1) as the topmost entry in 
the popup menu.
- Applet: shows the filename as the bottommost sub-entry inside the former.

If line 1 is empty, defaults to filename.

** With PDB files, the same degree of information would be desirable. However, 
currently only the filename is used:
- Jmol app window title: shows the filename + extension, followed by a dash and 
(again) the filename without extension. 
- Both app and applet: show the filename (without extension) as the topmost 
menu entry in the popup menu. 
- Applet: shows the filename + extension as the bottommost sub-entry inside the 
former.

Proposal: interpret the molecule name or description in the PDB file and use it 
similarly to what is done for MOLfiles' 1st line. Current use of the filename 
is redundant.
I'm not an expert on PDB format, and the structure of PDB entries is varied, so 
this is open to discussion and likely to need quite a parsing. There are 
candidate fields: COMPND, TITLE, HEADER; if one is  missing, another could take 
its place.
HEADER is the easiest and seems logical, but it tends to contain a generic name 
or cathegory often not specific to the molecule (e.g. "Hydrolase(O-Glycosyl)" 
for lysozyme 123L.pdb); I'd prefer COMPND.

1) If "COMPND   2 MOLECULE:" field is present, use it; extract portion after 
MOLECULE: up to semicolon.
2) Else, use first line with "COMPND" field.
3) Else, use HEADER (trimming the deposition date and pdbcode).
4) Else, use TITLE (maybe trimmed, tends to be too long).
5) Use filename if the others fail.



----------------------------------------------------------------------

Comment By: Angel Herraez (aherraez)
Date: 2007-02-13 10:24

Message:
Logged In: YES 
user_id=1065324
Originator: YES

OK, what is being used for PDB files is completely reasonable. It seems
that I was was misguided by doing my testing with PDB files that have the
first line truncated or otherwise manipulated, so the filename is being
used for lack of a proper field at columns 63-66.
It could be a nice addition to have Jmol use the content of the following
columns too (say, from 63 to 72), since they are not given any use by the
official format --that will give space to put a longer name manually in
the file when it has been modified from the original pdb.
Apart from that, I think you shouldn't change anything. Sorry!

Regarding CIF and mmCIF, I am not familiar with those formats, but giving
the equivalent info looks the thing to do.


----------------------------------------------------------------------

Comment By: Bob Hanson (hansonr)
Date: 2007-02-13 09:58

Message:
Logged In: YES 
user_id=1082841
Originator: NO

OK, I looked into this some more. 

PDB files:

The model name is derived from the four-digit PDB designation from columns
63-66 of the PDB header line.
This is what is showing up in the app title after the dash and in the
pop-up menu top line. 

mmCIF files:

The model name is derived from the "data" line as in

data_1MBO


So this isn't a redundancy, really. The files just happen to be named this
way usually.
The two places that would make sense for this information would be the
very top line of the first
submenu item, the one with the 4-digit PDB designation. 

CIF files: Maybe we should start with these, because they are more
standardized. 
There is a standard block _struct

_struct.entry_id             1XY2 
_struct.title                
;CRYSTAL STRUCTURE ANALYSIS OF DEAMINO-OXYTOCIN. CONFORMATIONAL
FLEXIBILITY AND RECEPTOR BINDING
;
_struct.pdbx_descriptor      '1 BETA-MERCAPTOPROPIONATE-OXYTOCIN (DRY
FORM)' 


Of these, entry_id is required; title is not. So that's fine. 

In addition, we have:

    loop_
    _pdbx_entity_name.entity_id
    _pdbx_entity_name.name
    _pdbx_entity_name.name_type
    1   "PLASTOCYANIN"        'SWS-NAME'
    1   "Electron transport" 'SWS-KEYWORD'

I think you can see that the PDB information from HEADER and TITLE are
being put in these locations. It wouldn't be hard to add a "model title"
as well as "model name" and then use the title various places. 

The problem I see with using COMPND is that it is really a list of
components that could be in any order. That "2" in "COMPND 2" is just a
continuation line marker, so that's not reliable, and it isn't represented
by anything in the CIF file. For example, for 1hje we have:

CIF:

_struct.entry_id             1HJE 
_struct.title                'CRYSTAL STRUCTURE OF ALPHA-CONOTOXIN SI' 
_struct.pdbx_descriptor      'ALPHA-CONOTOXIN SI' 

_struct_keywords.entry_id        1HJE 
_struct_keywords.pdbx_keywords   CONOTOXIN 
_struct_keywords.text            'CONOTOXIN, NICOTINIC ACETYLCHOLINE
RECEPTOR, TOXIN, VENOM' 

PDB:

HEADER    CONOTOXIN                               15-JAN-01   1HJE        
     
TITLE     CRYSTAL STRUCTURE OF ALPHA-CONOTOXIN SI                         
     

CIF:

loop_
_entity.id 
_entity.type 
_entity.src_method 
_entity.pdbx_description 
_entity.formula_weight 
_entity.pdbx_number_of_molecules 
_entity.details 
_entity.pdbx_mutation 
_entity.pdbx_fragment 
_entity.pdbx_ec 
1 polymer syn 'ALPHA-CONOTOXIN SI' 1359.654 1  'AMIDATED C-TERMINUS' ? 
'RESIDUES 50-62' ? 
2 water   nat water                18.015   27 ?                     ? ? 
? 

loop_
_entity_name_com.entity_id 
_entity_name_com.name 
1 'SI (2-7,3-13)' 
2 ?               

and then in the PDB we have:

COMPND    MOL_ID: 1;                                                      
     
COMPND   2 MOLECULE: ALPHA-CONOTOXIN SI;                                  
     
COMPND   3 CHAIN: A;                                                      
     
COMPND   4 SYNONYM: SI (2-7,3-13);                                        
     
COMPND   5 FRAGMENT: RESIDUES 50-62;                                      
     
COMPND   6 OTHER_DETAILS: AMIDATED C-TERMINUS                             
     


Does this help?

Bob





----------------------------------------------------------------------

Comment By: Bob Hanson (hansonr)
Date: 2007-02-10 08:52

Message:
Logged In: YES 
user_id=1082841
Originator: NO

whatever we do for PDB we should do for CIF or at least mmCIF

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=379136&aid=1656702&group_id=23629

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

[Jmol-developers] [ jmol-Feature Requests-1656702 ] molecule name shown for PDB

Reply via email to