Hi David.
Answer below - I started to write this then got distracted by other
things, as usual.
On 24/01/2014 21:21, David Roldán Martínez wrote:
I'm trying to figure out how to populate dbRefs. Seeing the file I
guess that dbRefs are /db_xref items. Right?
Not necessarily - the semantics are not quite the same, even though the
label sounds similar. A DBRefEntry in jalview is an accession Id to some
external database that somehow relates to the sequence entry.
If this is the case, to build a DBRefEntry I need source, version,
accessionId and mapping.
Take a look at the DBRefEntry constructor javadoc:
39
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l39>
/**
40
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l40>
*
41
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l41>
* @param source
42
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l42>
* canonical source (uppercase only)
43
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l43>
* @param version
44
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l44>
* (source dependent version string)
45
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l45>
* @param accessionId
46
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l46>
* (source dependent accession number string)
47
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l47>
* @param map
48
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l48>
* (mapping from local sequence numbering to source accession
49
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l49>
* numbering)
50
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l50>
*/
51
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l51>
public DBRefEntry(String source, String version, String accessionId,
52
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/DBRefEntry.java;hb=HEAD#l52>
Mapping map)
source and accession must be non-null, but version and map may be null.
'source' is typically a canonical name for a database source - the
jalview.utils.DbRefSource class includes some hardcoded constants for
sources that have special meaning to Jalview (at some point the
hardcoded strings will be replaced by a more rigorous canonical name
lookup system).
In something like this:
source 1..5028
/organism="Saccharomyces cerevisiae"
/db_xref="taxon:4932"
/chromosome="IX"
/map="9"
Is it possible to consider source=organism, accessionId=db_xref,
version=xx and mapping=yy?
No. 'organism' isn't a source name. 'source' is annotation describing
the source of the sequence data in the record. The only dbrefentry that
might come out of this is a cross reference to the NCBI taxon database
- so 'source' == 'taxon' and 'accession'=='4932' for the DbRefEntry.
And if it is:
CDS <1..206
/codon_start=3
/product="TCP1-beta"
/protein_id="AAA98665.1"
/db_xref="GI:1293614"
Is it possible to consider source=protein_id, accessionId=db_xref,
version=product and mapping=xx? However, at the explanation of the
Sample Record, when talking about protein_id..."A protein sequence
identification number, similar to the Version
<http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#VersionB>
number of a nucleotide sequence. Protein IDs consist of three letters
followed by five digits, a dot, and a version number.". Should I parse
protein_id at use the result to populate version (1 in this case)?
That seems right.
CDS annotation define the coding regions for proteins. The logic in this
method:
http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD
/**
484
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l484>
* attempt to extract coding region and product from a feature and properly
485
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l485>
* decorate it with annotations.
486
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l486>
*
487
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l487>
* @param feature
488
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l488>
* coding feature
489
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l489>
* @param sourceDb
490
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l490>
* source database for the EMBLXML
491
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l491>
* @param seqs
492
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l492>
* place where sequences go
493
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l493>
* @param dna
494
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l494>
* parent dna sequence for this record
495
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l495>
* @param noPeptide
496
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l496>
* flag for generation of Peptide sequence objects
497
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l497>
*/
498
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l498>
private void parseCodingFeature(EmblFeature feature, String sourceDb,
499
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l499>
Vector seqs, Sequence dna, boolean noPeptide)
500
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l500>
{
501
<http://source.jalview.org/gitweb/?p=jalview.git;a=blob;f=src/jalview/datamodel/xdb/embl/EmblEntry.java;h=0ae49b998d1d27a1f8dac69e6eb10a4a476d4944;hb=HEAD#l501>
boolean isEmblCdna = sourceDb.equals(DBRefSource.EMBLCDS);
does the transformation for the XML version.
And, finally, I see some correspondence between gene and CDS entries.
Each time there is a gene entry, you'll find a CDS entry following the
first, and you'll be able to relate them using /gene and A..B fields.
gene 687..3158
/gene="AXL2"
CDS 687..3158
/gene="AXL2"
/note="plasma membrane glycoprotein"
/codon_start=1
/function="required for axial budding pattern of S.
cerevisiae"
/product="Axl2p"
/protein_id="AAA98666.1"
/db_xref="GI:1293615"
/translation="..."
Is it possible to consider source=gene, accessionId=db_xref,
version=product and mapping=xx?
again - db_xref is parsed into source='GI', and accessionId="1293615"
here for one DBRefEntry.
Have a look at the logic in the parseCodingFeature method. The easiest
way may be to adapt the code to work with the classes you populate from
the GenBank record.. but you might even consider going the other way,
and adapt the GenBank 'parse' routine to create objects from the the
jalview.datamodel.xdb.embl package.
Sorry for the delay in replying.. I'm just trying to get a Jalview
release out of the door...
Jim.
_______________________________________________
Jalview-dev mailing list
[email protected]
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev