Andreas Tille pushed to branch upstream at Debian Med / kaptive


Commits:
356ee2d6 by Andreas Tille at 2020-09-13T07:09:13+02:00
New upstream version 0.7.3
- - - - -


4 changed files:

- README.md
- kaptive.py
- reference_database/Acinetobacter_baumannii_OC_locus_primary_reference.gbk
- reference_database/Klebsiella_k_locus_primary_reference.gbk


Changes:

=====================================
README.md
=====================================
@@ -12,7 +12,7 @@ Given a novel genome and a database of known loci (K, O or 
OC), Kaptive will hel
 In cases where your input assembly closely matches a known locus, Kaptive 
should make that obvious. When your assembly has a novel type, that too should 
be clear. However, Kaptive cannot reliably extract or annotate locus sequences 
for totally novel types – if it indicates a novel locus is present then 
extracting and annotating the sequence is up to you! Very poor assemblies can 
confound the results, so be sure to closely examine any case where the locus 
sequence in your assembly is broken into multiple pieces.
 If you think you have found a novel locus that should be added to one of the 
databases distributed with Kaptive please [contact 
us](mailto:[email protected]).
 
-Read more about Kaptive, Kaptive Web and the locus databases in [our 
papers](#citation).
+For citation info and details about Kaptive, Kaptive Web and the locus 
databases, see [our papers](#citation) below.
 
 
 ## Table of Contents
@@ -91,7 +91,7 @@ kaptive.py -h
 
 #### Other dependencies
 
-Regardless of how you download/install Kaptive, it requires that 
[BLAST+](http://www.ncbi.nlm.nih.gov/books/NBK279690/) is available on the 
command line (specifically the commands `makeblastdb`, `blastn` and `tblastn`). 
BLAST+ can usually be easily installed using a package manager such as 
[Homebrew](http://brew.sh/) (on Mac) or 
[apt-get](https://help.ubuntu.com/community/AptGet/Howto) (on Ubuntu and 
related Linux distributions).
+Regardless of how you download/install Kaptive, it requires that 
[BLAST+](http://www.ncbi.nlm.nih.gov/books/NBK279690/) is available on the 
command line (specifically the commands `makeblastdb`, `blastn` and `tblastn`). 
BLAST+ can usually be easily installed using a package manager such as 
[Homebrew](http://brew.sh/) (on Mac) or 
[apt-get](https://help.ubuntu.com/community/AptGet/Howto) (on Ubuntu and 
related Linux distributions). Some later versions of BLAST+ have been 
associated with sporadic crashes when running tblastn with multiple threads; to 
avoid this problem we recommend running Kaptive with BLAST+ v 2.3.0 or using 
the "--threads 1" option (see below for full command argument details).
 
 
 ## Input files
@@ -308,7 +308,18 @@ WARNING: If you use the variant database please inspect 
your results carefully a
 
 Database versions:
 * Kaptive releases v0.5.1 and below include the original _Klebsiella_ K locus 
databases, as described in [Wyres, K. et al. Microbial Genomics 
(2016).](http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000102)
-* Kaptive v0.6.0 includes four novel primary _Klebsiella_ K locus references 
defined on the basis of gene content (KL162-KL165) in this 
[paper.](https://www.biorxiv.org/content/10.1101/557785v1)
+* Kaptive v0.6.0 and above include four novel primary _Klebsiella_ K locus 
references defined on the basis of gene content (KL162-KL165) in this 
[paper.](https://www.biorxiv.org/content/10.1101/557785v1)
+* Kaptive v0.7.1 and above contain updated versions of the KL53 and KL126 loci 
(see table below for details). The updated KL126 locus sequence will be 
described in McDougall, F. et al. 2020. _Klebsiella pneumoniae_ diversity and 
detection of _Klebsiella africana_ in Australian Fruit Bats (_Pteropus 
policephalus_). _In prep._
+* Kaptive v0.7.2 and above include a novel primary _Klebsiella_ K locus 
reference defined on the basis of gene content (KL166), which will be described 
in Li, M. et al. 2020. Characterization of clinically isolated hypermucoviscous 
_Klebsiella pneumoniae_ in Japan. _In prep._
+* Kaptive v0.7.3 and above include four novel primary _Klebsiella_ K locus 
references defined on the basis of gene content (KL167-KL170), which will be 
described in Gorrie, C. et al. 2020. Opportunity and diversity: A year of 
_Klebsiella pneumoniae_ infections in hospital. _In prep._
+
+
+Changes to the _Klebsiella_ K locus primary reference database:
+
+| Locus  | Change | Reason | Date of change | Kaptive version no. |
+| ------------- | ------------- | ------------- | ------------- | 
------------- |
+| KL53  | Annotation update: _wcaJ_ changed to _wbaP_ | Error in original 
annotation | 21 July 2020 | v 0.7.1 | 
+| KL126  | Sequence update: new sequence from isolate FF923 includes _rmlBADC_ 
genes between _gnd_ and _ugd_ | Assembly scaffolding error in original sequence 
from isolate A-003-I-a-1 | 21 July 2020 | v 0.7.1 |
 
 #### _Klebsiella_ O locus database
 
@@ -330,7 +341,7 @@ The _A. baumannii_ OC (lipooligosaccharide outer core) 
locus reference database
 WARNING: These databases have been developed and tested specifically for _A. 
baumannii_ and may not be suitable for screening other _Acinetobacter_ species. 
You can check that your assembly is a true _A. baumannii_ by screening for the 
_oxaAB_ gene e.g. using blastn.
 
  Database versions:
-* Kaptive v0.7.0 and above include the original _A. baumannii_ K and OC locus 
databases, as described in Wyres, KL. et al. _In prep_ 2019.
+* Kaptive v0.7.0 and above include the original _A. baumannii_ K and OC locus 
databases, as described in [Wyres, KL. et al. Microbial Genomics, 
2020.](https://doi.org/10.1099/mgen.0.000339)
 
 
 
@@ -349,6 +360,21 @@ Kaptive uses 'tblastn' to screen for the presence of each 
locus gene with a cove
 
 A small number of the original _Klebsiella_ K locus references are truncated, 
containing only a partial <i>ugd</i> sequence. The reference annotations for 
these loci do not include <i>ugd</i>, so are not identified by the 'tblastn' 
search. Instead <b>Kaptive</b> reports the closest match to the partial 
sequence (if it exceeds the 90% coverage threshold). 
 
+#### Why has the best matching locus changed after I reran my analysis with an 
updated version of the database? ####
+
+The databases are updated as novel loci are discovered and curated. If your 
previous match had a confidence call of 'Low' or 'None' but your new match has 
higher confidence, this indicates that your genome contains a locus that was 
absent in the older version of the database! So nothing to worry about here.
+
+But what if your old match and your new match have 'Good' or better confidence 
levels?
+
+If your old match had 'Perfect' or 'Very High' confidence, please post an 
issue to the issues page, as this may indicate a problem with the new database!
+
+If your old match had 'Good' or 'High' confidence please read on...
+
+Polysaccharide loci are subject to frequent recombinations and rearrangements, 
which generates new variants. As a result, a small number of pairs of loci 
share large regions of homology e.g. the _Klebsiella_ K-locus KL170 is very 
similar to KL101, and in fact seems to be a hybrid of KL101 plus a small region 
from KL106. 
+Kaptive can accurately distinguish the KL101 and KL170 loci when it is working 
with high quality genome assemblies, but this task is much trickier if the 
assembly is fragmented. This means that matches to KL101 that were reported 
using an early version of the K-locus database might be reported as KL170 when 
using a later version of the database.
+However, this should only occur in instances where the K-locus is fragmented 
in the genome assembly and in that case Kaptive will have indicated 'problems' 
with the matches (e.g. '?' indicating fragmented assembly or '-' indicating 
that an expected gene is missing), and the corresponding confidence level will 
be at the lower end of the scale (i.e. 'Good' or 'High', but not 'Very High' or 
'Perfect').
+You may want to try to figure out the correct locus manually, e.g. using 
[Bandage](https://rrwick.github.io/Bandage/) to BLAST the corresponding loci in 
your genome assembly graph. 
+
 
 ## Citation
 
@@ -359,7 +385,7 @@ If you use [Kaptive Web](http://kaptive.holtlab.net/) 
and/or the _Klebsiella_ O
 [Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction 
for _Klebsiella_ genomes. Journal of Clinical Microbiology 
(2018).](http://jcm.asm.org/content/56/6/e00197-18)
 
 If you use the _A. baumannii_ K or OC locus database(s) in your research 
please cite this paper:
-Identification of _Acinetobacter baumannii_ loci for capsular polysaccharide 
(KL) and lipooligosaccharide outer core (OCL) synthesis in genome assemblies 
using curated reference databases compatible with Kaptive. Wyres KL, Cahill SM, 
Holt KE, Hall RM and Kenyon JJ. _In preparation_.  
+[Identification of _Acinetobacter baumannii_ loci for capsular polysaccharide 
(KL) and lipooligosaccharide outer core (OCL) synthesis in genome assemblies 
using curated reference databases compatible with Kaptive. Microbial Genomics 
(2020).](https://doi.org/10.1099/mgen.0.000339)  
 Lists of papers describing each of the individual _A. baumannii_ reference 
loci can be found [here](https://github.com/katholt/Kaptive/tree/master/extras).
 
 


=====================================
kaptive.py
=====================================
@@ -52,7 +52,7 @@ import random
 from collections import OrderedDict
 from Bio import SeqIO
 
-__version__ = '0.5.1'
+__version__ = '0.7.3'
 
 
 def main():


=====================================
reference_database/Acinetobacter_baumannii_OC_locus_primary_reference.gbk
=====================================
@@ -1446,7 +1446,7 @@ FEATURES             Location/Qualifiers
                      /note="sequence from NCBI GenBank accession number
                      KF030679 REGION: complement(28675..37977)"
      CDS             1..888
-                     /gene="gtrOC1""
+                     /gene="gtrOC1"
                      /codon_start=1
                      /transl_table=11
                      /product="GtrOC1 glycosyltransferase"
@@ -1749,7 +1749,7 @@ FEATURES             Location/Qualifiers
                      /note="sequence from NCBI WGS accession number
                      AMTB01000038 REGION: 221522..230586"
      CDS             1..888
-                     /gene="gtrOC1""
+                     /gene="gtrOC1"
                      /codon_start=1
                      /transl_table=11
                      /product="GtrOC1 glycosyltransferase"
@@ -2038,7 +2038,7 @@ FEATURES             Location/Qualifiers
                      /note="sequence from NCBI WGS accession number
                      AMFY01000013 REGION: 222496..228777"
      CDS             1..888
-                     /gene="gtrOC1""
+                     /gene="gtrOC1"
                      /codon_start=1
                      /transl_table=11
                      /product="GtrOC1 glycosyltransferase"
@@ -2245,7 +2245,7 @@ FEATURES             Location/Qualifiers
                      /note="sequence from NCBI WGS accession number
                      AMFI01000027 REGION: complement(34336..40843)"
      CDS             1..888
-                     /gene="gtrOC1""
+                     /gene="gtrOC1"
                      /codon_start=1
                      /transl_table=11
                      /product="GtrOC1 glycosyltransferase"


=====================================
reference_database/Klebsiella_k_locus_primary_reference.gbk
=====================================
The diff for this file was not included because it is too large.


View it on GitLab: 
https://salsa.debian.org/med-team/kaptive/-/commit/356ee2d6a6615f6d98011f8c1164404464c427ca

-- 
View it on GitLab: 
https://salsa.debian.org/med-team/kaptive/-/commit/356ee2d6a6615f6d98011f8c1164404464c427ca
You're receiving this email because of your account on salsa.debian.org.


_______________________________________________
debian-med-commit mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/debian-med-commit

Reply via email to