[med-svn] [sga] annotated tag upstream/0.10.13 created (now dffa83c)

Andreas Tille Mon, 11 Jan 2016 04:37:34 -0800

This is an automated email from the git hooks/post-receive script.

tille pushed a change to annotated tag upstream/0.10.13
in repository sga.


        at  dffa83c   (tag)
   tagging  3cae268067c8985f1d57b3ea8d407e0db5458dd3 (commit)
  replaces  v0.9.4
 tagged by  Andreas Tille
        on  Wed May 28 07:39:13 2014 +0200

- Log -----------------------------------------------------------------
Upstream version 0.10.13

Albert Vilella (8):
      a bit rough, but just to have an idea of what is needed
      adding prefix to walk method for convenience
      Merge github.com:jts/sga
      reverting to original walk
      adding prefix to walk for convenience to the pinball pipeline
      walk with prefix tweak
      deleting INSTALL for now
      fixed typo --component-paths should be --component-walks in cerr

Andreas Tille (1):
      Imported Upstream version 0.10.13

Cornelis Arnout Albers (32):
      Integrated Dindel
      it compiles!
      Integrated DindelRealignWindow into HapgenProcess
      Added VCFFile output
      Makes calls now, did some initial checking and debugging
      Changed calling to ModelSelection. Added EM haplotype frequency 
estimation. Seems to work.
      Fixed addSNP to candidate haplotypes.
      fixed a couple of bugs. Still debug version but seems to do sensible 
things.
      tested on 167924_A1 exome and seems to give already nice results.
      Fixed position bug. Fixed output of uncalled variants
      Fixed strand issue.
      Fixed quality score issue for haplotypes mapping with low quality scores 
to places in the reference. Added ID output in VCF
      Fixed getDistance: it now only used reads that match the haplotype 
sequence without mismatch at the position of variant. Also fixed outputAsVCF:it 
now combines freuquencies from haplotypes mapping to the same position.
      Fixed incorrect averaging in outputAsVCF
      fixed silly bug in computation of penalties for haplotype alignments
      Optimized and added INFO tag outputs.
      March 20 debug version
      Added SingleRead as replacement for MatePairs
      Fixed variant frequency.
      Changed FLANKING_SIZE to zero and DINDEL_DEBUG_3->1
      Merge branch 'graph-diff-v4' of 
/nfs/users/nfs_j/js18/work/git_repository/sga into graph-diff-v4
      Fixed -MAX_INT varqual bug. Made sure flankingHaplotypes is unique. SET 
MAPPING QUALITY TO 1000 for all candidate alignments
      Fixed mapping bug and -MAX_INT QUAL bug. realignMatePairs can be used to 
choose mate pair alignment, automatically sets FLANKING_SIZE to 1000.
      Added MultiSample EM and caller. Fixed homopolymer, added AmbiMap filter 
tag
      Added multisample EM caller.
      Fixed homopolymer error when variant is in last column
      Added genotyping.
      Genotyping in multisample caller seems to work.
      Fixed inf bug in genotyping
      fixing alignment bug
      Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
      Fixed DindelHaplotype::extractVariants() incorrect assertion

David Rio Deiros (2):
      Warning the user about the fact he will have issues merging
      Adding feature in preprocess step to remove adapter from reads.

Jared Simpson (627):
      Added extra information to sga stats --run-lengths
      Capped max run length in --run-lengths option to make the output more 
readable
      Refactored the BWT Markers into their own file. Also moved accumulation 
code out of RLBWT into the RLUnit
      Refactored RLUnit into its own class
      Changed name of FULL_COUNT define to RL_FULL_COUNT
      Re-implemented connect pipeline. Cleaned up sga-assemble
      Changed input parameters in sga-pipeline
      Skeleton of fm-merge subprogram
      Added skeleton code for the FMMerge processes
      Added BitVector class for storing large arrays of bits. Stubbed in some 
functionality of FM-merge.
      Started implementation of FMMergeProcess logic
      Rewrote overlapReadExact to fix the very rare case where a read has a 
proper prefix/suffix overlap to itself. This case
      Working version of fm-merge. Currently requires a remove duplicate edges 
operation which is sub optimal.
      Fixed bug in conflictConsensus algorithm where the error correction would 
not filter out true conflicts if the root base is the 3rd (or 4th) most 
frequent base but still above the cutoff.
      Implemented interleaved mode for sga preprocess
      Merge branch 'fm-merge'
      Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
      Implemented 1-bit sparse gap array for rmdup/qc
      Added a simple kmer caching scheme to the kmer-based corrector to avoid 
duplicate lookups in the fm-index.
      Modified k-mer corrector to take quality values into account when 
determining thresholds for correction.
      Changed default sample rate and gap array size for sga merge
      Implemented quality-aware overlap correction. The quality scores are used 
to select the cutoffs for the number of times a base needs to be seen to avoid 
being corrected away.
      Changed error corrector to print out a masked multi-overlap
      Fixed bug where fm-merge would crash if the graph contains a simple cycle
      Revised dead-end trim function to only remove if the branch is less than 
a minimum length. This makes the trimmer function properly on a graph 
constructed by fm-merge.
      Fixed very subtle bug in overlap computation when reads have multiple 
valid overlaps to each other.
      Added some extra comments
      Removed errant print from SGA/index
      Modified sga-rmdup to determine which read to keep based on the index of 
the reads, not their full id/name.
      Implemented mutex lock on BitVector to allow threads to update the 
bitvector atomically in fm-merge.
      Removed warning
      Improved the speed of sga-qc by an order of magnitude.
      Implemented more complex variation (bubble) removal algorithm.
      Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
      Implemented writing out variants to a file in fasta format.
      sga-qc: Moved the delete call for the BWTs to occur before the indices 
are rebuilt to avoid having two copies loaded at once needlessly.
      Removed unncessary to-do warning from variant smoother
      Fixed bug where variant removal would assert if there was a degenerate 
bubble
      Added exit condition to SGSearch::findVariantWalks to avoid infinite 
loops in the case that the graph contains a non-branching cycle
      Implemented new SGSearchTree class for more efficient graph searching. 
All functionality
      Integrated new search tree code into the variation smoother. The results 
are subtley
      Modified output files to have a common prefix, which can be specified on 
the command line with the -o option.
      Integrated SGSearchTree into SGSearch::findWalks
      Updated configure.ac and the README to support use of the BamTools 
library.
      First implementation of sga connect using a BAM file. This version is 
functional but could be more efficient.
      Much faster version of BAM-based connect. Now uses substring operations 
when extracting the fragment instead of copying potentially very large strings 
constantly.
      Added statistics to sga connect to describe why the connection failed
      Cleanup up the connection code.
      Fixed sga-correct to search for a path with the correct length by 
subtracting the amount of the fragment that is present in the current contig.
      Added extra output to sga-scaffold
      Added ScaffoldGroup class to handle ordering a set of contigs.
      Added functions to compute the probability that two scaffold links are 
incorrectly ordered.
      Made the SGSearchTree a generic templated class so that it can also be 
used for the scaffolding module.
      Continued to make the graph search functions more generic.
      Started to implement searching for walks on a scaffold graph.
      Simplified construction of variation paths in the graph. More work on 
making the searching code more generic.
      Removed some dead code.
      Seperated input of paired end and mate pair libraries for scaffolder.
      Added new function to remove transitive edges from the scaffold graph. 
Needs more work.
      Experimental layout algorithm for scaffoldding
      Cleaned up code
      Removed unused object to remove warning.
      Added function to infer secondary links between nodes in a putative 
scaffold.
      Implemented connected components algorithm.
      Added new ScaffoldAlgorithms files and refactored some code into these.
      Wrote algorithm to compute a layout of the connected component for a 
scaffold starting from a terminal vertex. This function is the backbone of the 
scaffolding algorithm.
      The scaffolder now writes out scaffold statistics at the end of the 
program.
      Minor update to scaffolding output message formatting.
      First implementation of new scaffolding algorithm.
      Removed some prints
      Integrated Heng Li's stdaln dynamic programming library into ThirdParty. 
This is used in the variation removal algorithm to set an upper bound on how 
different two sequences can be and still be removed.
      Changed name of cigar line in variants file to make it clear its an 
internally-used field and not for the fasta sequence that is output.
      Wrote compare-and-swap updates for the BitChar/BitVector data structures.
      Rewrote FMMergeProcess to use the compare-and-swap functionality in the 
bit vector. Removed the locks.
      Fixed scaffold2fasta to search for a path of the correct length (was 
using end to start distance instead of end to end).
      Wrote function to find and remove cycles in the scaffold graph
      Added status output message to the link validator
      Added filterBAM subprogram to attempt to get rid of bad MP reads.
      Imposed a max indel size on the variant resolution.
      Minor formatting tweaks.
      Implemented SV detection and removal for the scaffolder. Turned off by 
default.
      Changed scaffold2fasta to write out unplaced scaffolds and use a gap with 
a minimum length
      Added function to break the scaffold graph at positions that have 
conflicting distance estimates.
      Removed dot file output from scaffolder
      Updated version to 0.9.5
      Updated OverlapAlgorithm::_processIrreducibleBlocksExact to work in the 
current overlapping framework. It is now used by default.
      Moved astat.py from sgatools repository to sga/src/bin/sga-astat.py
      Implemented loading contigs from a fasta file for scaffold2fasta
      Cleaned up sga-scaffold to only print link validation messages if -v flag 
is given.
      Implemented dust filter for low complexity sequences in sga-preprocess as 
suggested by Albert Vilella.
      Updated README to reference the python modules that the pipeline scripts 
require.
      Added parameter to specificy the maximum number of bases to correct with 
the kmer corrector.
      Added parameter to specify the maximum number of bases to correct with 
the kmer corrector.
      Added warning to the overlap computation for when a substring read is 
found.
      Added genome size option to sga-astat.py as alternative to performing the 
bootstrap estimate
      Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
      Changed overlap correction QC. Now requires at least one overlap 
supporting each base in the read after correction.
      sga-preprocess will now append /1 or /2 to read names in pe-mode if the 
paired reads have the exact same name.
      Added flag to SGSearch::findWalks to specify whether any walks should be 
returned if the search was aborted. All uses of findWalks require an exhaustive 
search to be performed except the utility sga-walk.
      Switched the irreducible block algorithm back to inexact mode.
      [Issue GH-1] Changed shebang in python scripts to use /usr/bin/env so 
user's environment python is used. Reported and fix suggested by John St. John.
      [Issue GH-3]: sga-preprocess will stop reading the file if there is a 
fastq record with no sequence or quality values.
      Implemented quality score conversion from phred64 to phred33 for 
preprocess.
      Moved sga-mergeDriver.pl from sgatools into main repository.
      Wrote help message for sga-mergeDriver.pl
      Cleaned up handling of cycles in fm-merge.
      Added --kmer-distribution function to sga-stats. Re-enable the -x option 
for sga correct to set the min kmer coverage required.
      Rewrote the CorrectionThresholds to be a proper (singleton) class instead 
of a namespace.
      Fixed bug in OverlapAlgorithm::_processIrreducibleBlocksExact where the 
assertion was checking the wrong condition. Added --exact optiont overlap to 
force the use of the exact irreducible algorithm.
      Rewrote the exact-mode irreducible block algorithm to be iterative 
instead of recursive.
      Made the exact-mode irreducible algorithm the default again for 
overlap/fm-merge.
      Added --no-overlap and --branch-cutoff options to sga-stats.
      Wrote experimental program sga-cluster to write out the connected 
components of a graph. Requested by Albert Vilella.
      Refactored KmerDistribution code into its own class.
      First pass at learning the kmer correction threshold.
      Added the contigs filename to the temporary output of bwa aln so the same 
reads can be mapped to different contigs at the same time without having a 
filename clash.
      Added -w parameter to sga-walk to allow the exact sequeunce of the walk 
to be specified.
      Changed the error correction metrics to use wider integers to avoid wrap 
arounds for very large data sets.
      Made the kmer corrector the default algorithm to use for sga-correct. 
Changed the default kmer size to 31.
      Made the temp files of the bwa aln step avoid using relative paths.
      Modified sga-astat.py so that it does not require a bam index file.
      Added exit statement to sga-bam2de.pl when the the command line arguments 
are incorrect.
      added optional -k parameter to sga-bam2de.pl
      Fixed usage message for scaffold.
      Made command line argument to change the minimum walk distance in 
sga-connect
      Fixed tripped assertion in makeScaffolds for the very rare case that a 
terminal vertex cannot be found. Fixed stats output to avoid 32-bit integer 
wraparound.
      Changed help text for --max-distance option to filterBAM
      Rewrote sga-cluster to use the FM-index instead of an asqg file.
      Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
      Fixed warning in OverlapBlock
      Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
      Added new filtering modes to sga-filterBAM. Can now filter out pairs 
based on error rate, mapping quality and kmer depth.
      Fixed bug in filterBAM where an extra read pair would be erroneously 
output
      Removed the default seed stride parameter for sga-cluster as it would 
lead to some overlaps being missed.
      Made sga-cluster emit an error when a substring read is found.
      Bug fix in ScaffoldRecord to avoid outputting duplicate records for 
singleton scaffolds.
      Changed default gap storage parameter in sga/index
      Changed sga-scaffold arguments. Unified repeat/unique a-stat so any 
contig that does not meet the unique cutoff is deemed to be a repeat. Added 
--min-copy-number parameter to discard contigs that have a low (<0.3) estimated 
copy number.
      Changed sga-cluster so the temp file uses the same prefix as the -o 
parameter to prevent name clashes.
      Changed default gap array parameter back to 8.
      [Github issue 4] Made the error message for the case where a substring 
read is found during string graph construction to be more informative.
      Bumped version to 0.9.6
      Made the maximum distance estimate error when resolving gaps a command 
line parameter
      bumped version number
      Disabled RLBWT validation
      New mode to sga-walk which exhaustively finds all walks through the 
largest connected component of the input graph. Used in sga-cluster workflow.
      Modified k-mer corrector to only use the forward bwt when looking up 
k-mer counts. This effectively halves the memory usage of the correction step.
      Merged the duplicate removal and qc checks into a single process: 
sga-filter.
      Increased version to 0.9.8
      Rewrote the gap array to use compare and swap instructions when updating 
the base counts. This allows much better concurrency when merging/removing 
reads from a bwt.
      Added comments to the SparseGapArray
      More comment improvement
      Re-enabled -Wall as bamtools now compiles without warnings.
      Implemented caching of BWTIntervals for all strings of a given length. 
Currently used in the k-mer corrector.
      Changed BWTIntervalCache::lookup to take in a c-string to avoid an extra 
copy
      Code cleanup: created a parameter object for setting the error correction 
options in place of a constructor with many arguments
      Added option to sga-index to suppress constructing the reverse BWT. 
Modified sga-merge to avoid attempting to merge RBWTs if they don't exist.
      Added a directory with sga examples. Currently holds a script for a c. 
elegans assembly.
      Incremented version to 0.9.9.
      Added short README to the top level directory.
      Tweaked wording in top-level README
      Updated main README file.
      sga scaffold: allow loading sequences with ambiguity codes, disable the 
requirement that an a-statistic file is provided.
      Changed wording in a-stat warning
      Rewrote some ScaffoldRecord functions to take in a parameter object
      Factored out the scaffold input sequence container into 
ScaffoldSequenceCollection so that it does not require using a StringGraph
      Added std::map-based ScaffoldSequenceCollection for scaffolding sequences 
that do not belong to a graph.
      Handle Ns in Util::complement
      Fixed gcc 4.6 compile warnings
      Created files and framework for implementing bwa-sw algorithm. Not 
functional or usable.
      Updates to bwa-sw algorithm
      Implemented most of core bwasw algorithm. Output verified to match that 
of Heng Li's implementation. Does not restrict the number of nodes to track 
(z-best heuristic) or output the alignments. Lots of debug information in this 
version, not usable.
      Implemented saving found hits. Still very much development, not for use.
      First implementation of sampled suffix array data structure and gen-ssa 
subprogram.
      Implemented O(N) algorithm for constructing the sampled suffix array from 
a BWT. Implemented reading/writing SSA.
      Created new subprogram correct-long to hold bwa-sw porting work. Added 
generateCIGAR function to LRAlignment
      Macro'd out the bwa compatibility print statements
      Modified to read query sequences from a file.
      Implemented gapped MultiAlignment class
      Implemented function to transform a multiple alignment into a consensus 
string.
      Implemented cutTail function from bwasw to remove possibly erroneous 
cells from a stack. Currently it discards too much and removes a lot of valid 
hits.
      Improved version of cutTail that does choose the cells to keep based 
purely on the highest score, but using the fraction of the maximum possible 
score for the cell. This will not be the final version of cutTail however as it 
still discards useful hits.
      long read aligner: Added two new methods of cutting the tails of a cell 
array
      Added --cut argument to sga correct-long to choose cell reduction 
heuristic
      Added extra output to MultiAlignment::print
      Added extra debugging info to the long read aligner.
      Added function to save the terminal hits in long-read alignment mode
      Removed some debug output from the long read aligner
      Removed a bunch of debug info and refactored some code from the main 
bwaswAlignment function
      More refactoring of bwaswAlgorithm. Changed saveHits to keep all good 
hits, instead of just the top 2 for every position.
      Code cleanup in LRAlignment
      Added new LRCorrection module to implement the long-read correction code
      Renamed LRHit data members to have more sensible names
      First pass at extending LRHits to be full-length alignments
      Further refinements of LRHit extension. Not complete
      Turned off hit extension for now
      Added simple function to find new LRHits based on overlapping reads.
      Check sem_init return codes to catch errors when this function is not 
implemented (on OSX). This is a stop-gap measure until I have time to switch to 
named semaphores.
      New experimental long read correction algorithm based on threading the 
read through a de Bruijn graph
      Resurrected bwa-like saveHits function. Removed targetString member from 
LRCell/LRHit.
      Refactored a bunch of code that uses stdaln into StdAlnTools. Started 
work on the new graph-based correction code.
      Started string-threading extension code.
      Added core extension functionality to StringThreader
      First implementation of extension dynamic programming code.
      Changed ExtensionDP to use an edit distance scoring system. Fixed 
off-by-one in makePaddedStrings
      Implemented function in ExtensionDP to print the full alignment.
      Added function to ExtensionDP to calculate the local error rate of the 
alignment.
      Integrating ExtensionDP into StringThreader. Initialization of alignment 
for root node complete.
      Added code to calculate the extended alignments for the 
StringThreaderNodes
      Full extension algorithm complete, including culling leaves once their 
error rate is too high.
      Added a few more helper functions to the string threading code. 
Currently, the search explodes when threading through repeats that are slightly
      New culling heuristic for StringThreader
      Implemented writing out the corrected reads to a file.
      Wrapped the long read correction process in a class to use in the 
SequenceProcessFramework.
      Implemented outputting corrected sequences from the StringThreader. Still 
a bit hacky.
      tweaks to graph-based correction algorithm
      Merge branch 'master' of [email protected]:jts/sga
      Made the extension termination condition more robust. StringThreader now 
returns a trimmed alignment to avoid bad-tails of the long reads.
      Added early exit to kmerCorrect when the read sequence is shorter than 
the kmer length.
      Added read length check to k-mer filter.
      Refactored cluster generation code into ReadCluster class. Added stub 
subprogram for cluster-extend.
      Implemented extension mode for sga cluster. This required refactoring the 
SequenceProcessFramework to use a generic generator object.
      Enabled control of the max cluster size
      Fixed formatting of sga cluster help
      Added some defines to sga cluster to hide away the hideous template 
function calls
      Changed description in sga-cluster boilerplate
      Added missing shortopt for --min-branch-length to sga assemble
      Merge branch 'long-align'
      Increased default minimum branch length in sga assemble to better handle 
long (150+) reads.
      Cleaned up build. Added a Makefile to install the scripts the sga 
scaffold pipeline requires (astat, bam2de).
      Deleting deprecated sga-pipeline script
      Added example script for the Illumina MiSeq example data. Includes the 
scaffolding component.
      Added ability to cluster based on seed sequences that are not present in 
the FM-index.
      Fixed cluster size computation
      Temporarily disabled the threaded version of multi-key quicksort.
      Added ability to compute overlaps between two disjoint sets of reads.
      Increased version number to 0.9.10
      Refactored sga filter. Moved arguments to QCProcess into a parameter 
object.
      Added function to compute an interval pair using cached intervals.
      Removed hard-coded threading option to bwa aln
      Added filter for homopolymer sequencing errors and very low complexity 
sequence
      Cleaned up comments
      Version 0.9.11
      Now sga walk will write out the reads making up the walk string in SAM 
format if the --sam option is given. This replaces --description-file.
      In sga filter, exit the homopolymer check if the read length is shorter 
than the k-mer size. The homopolymer and complexity checks are now disabled by 
default.
      Updated version to 0.9.12
      Cleaned up unused files
      Fixed a bug in the suffix array validation code. The validator assumed 
that suffixes with the same string were sorted by read name when they are 
actually sorted by position in the file. Thanks to Tomas Larsson for the bug 
report and test data.
      In-progress checkin of subprogram to convert a BEETL index file into 
SGA's format
      Fixed rare bug in the scaffold builder where a contig could be added with 
the wrong orientation if a unique walk is found between the contig pair with 
orientation opposite to that of the link.
      Version v0.9.13
      Removed debug code from convert-beetl
      convert-beetl now writes out an .sai file
      Created script to generate a BWT using beetl and convert it to SGA's 
format.
      Fixed bug where the reverse index would not be used when the overlap 
method of correction is specified.
      Added human genome assembly instructions and updated c. elegans script 
with the parameters used in the sga paper.
      Update human assembly instructions.
      Merge branch 'beetl'
      Added new subprogram to extract the set of sequences from a bwt
      Changed the sample rate in bwt2fa to use less memory
      Rewrote the small repeat resolution algorithm to be much faster. New 
algorithm is slightly more aggressive.
      Track the number of vertices that have been merged into each vertex to 
properly decide which walks to retain when removing bubbles.
      Version 0.9.14
      Fixed usage message for correct and correct-long
      Changed beetl index to use a named version of sga. Not for release yet
      modified SampledSuffixArray to optionally work over the lexicographic 
index only (no samples). gen-ssa now avoids loading the read names to save 
memory.
      Added skeleton of graph-diff program
      Enabled graph-diff, cleaned up help message
      Finished skeleton of graph-diff
      Initial k-mer traversal code implemented
      Made branch code detection in GraphCompare more efficient
      Added code to build the sequence of the bubbles once a differing k-mer 
has been found.
      Implemented bitvector marking of used k-mers to avoid outputting 
duplicate variants
      Added better reporting metrics for the success rate of the bubble 
discovery process.
      GraphCompare now writes out the variants that it finds in fasta format.
      Refactored BubbleBuilder code into its own file.
      More refactoring.
      Refactoring
      Added parameter object to GraphCompare
      Changed GraphCompare status print condition
      Implemented threaded mode for GraphCompare.
      Removed unnecessary assertion when a loop is found when building the 
target bubble
      Moved from the substring graph traversal algorithm to a more standard 
SequenceProcessFramework-basd kmer traversal.
      Added logic to filter out low-frequency kmers when attempting the process 
on an uncorrected data set.
      Re-enabled writing out variants to a file.
      Added extra variant kmer marking to avoid double counting the bubble 
construction failure reasons
      Added coverage output to variants.fa
      Fixed boundary check for ignoring low-coverage edges
      Whitespace change only
      Created program and initial parsing code for var2vcf convertor.
      Reenabled sanity check insertion in var2vcf
      Added proper substring function to DNAString
      Implemented var2vcf to turn variants found by graph-diff into vcf records.
      Added quality filter and fixed VCF coordinate calculation in case where 
an insertion occurs along with a second variant.
      Added additional sanity checks in var2vcf to allow the processing a real 
human genome call set to go through.
      Removed print
      Allowed target portion of the bubble to branch. Controlled with the -y 
command line parameter.
      Added interval cache to graph-diff to speed up computation.
      Fixed -o/--outfile option to graph-diff
      Sort VCF file by the order of the reference sequences in the input BAM 
file instead of strict lexicographic ordering.
      Fixed variant file output name.
      Removed dust check TODO message, which was implemented in a previous 
commit
      Changed variation bubble builder to better support uncorrected sequence 
graphs.
      Added metagenome assembler program skeleton
      added parallel processing framework to the metagenome assembly subprogram
      Added skeleton of new hapgen program
      Added initial haplotype generation functionality
      Tweaked hapgen debug output
      hapgen: properly handle cases where the anchors cannot be found
      Wrote utility to build a simple multiple alignment from an array of 
strings
      First pass at extracting the reads mapping to haplotypes in hapgen
      correct-long: use sai file instead of ssa
      Merge branch 'small_repeat_rewrite'
      Fixed bug in small repeat resolution algorithm
      Brand new long read error correction based on haplotype generation code. 
The algorithm finds kmer anchors on the long reads, then builds putative 
haplotypes through a de Bruijn graph between them.
      Added skeleton of new sga gapfill program
      First pass at implementing gap filling logic
      Added more informative results stats to gapfill
      gapfill: the gap sequence is now placed into the scaffold and the new 
scaffolds written. First functional build
      Rewrote processGap/processScaffold to be cleaner and more robust
      When patching a scaffold gap, remove the overlapping input sequence 
instead of the gap sequence.
      Added descending kmer mode to gap filler and implemented first pass at 
choosing the assembled sequence which best fits the gap
      Do not correct a read unless two unique anchors can be found
      Changed long read error correction kmer size
      Implemented first pass at metagenomic assembly logic
      sga metagenome: implement compare and swap logic to avoid outputting 
duplicate contigs when two threads assemble the same sequence simultaneously
      Fixed metagenomics assembly logic when there is an in-branch into a repeat
      Now use BWTIntervalCache when calculating de Bruijn extensions
      Output start time for main beetl processes
      Testing a local coverage based coverage cutoff for the metagenomics 
prototype
      Added more output statistics to the gapfill module
      Merge branch 'metagenomics' into gapfill
      Write out beetl progress to a file.
      hapgen now extracts the piece of the reference that is being reassembled.
      hapgen: added function to MultiAlignment to construct an MA from local 
alignments. Added code to hapgen to pull out read pairs.
      Added threading options to scaffold driver scripts
      Merge branch 'gapfill' into hapgen
      Fixed error from merging gapfill branch
      Merge branch 'hapgen'
      Reverted back to the old repeat resolver code
      Made the gap fill start/end kmer sizes command line arguments
      Removed prints from scaffold sv resolver
      Added new filter to filterBAM to aggressively get rid of FR contamination 
in a mate pair library. Added new output to the Scaffold
      Added strict mode to scaffolder to only keep unambiguous connections in 
the scaffold graph
      v0.9.15
      Refined version of local coverage based metagenomic assembly
      Better implementation of coverage-cutoff based de Bruijn assembler for 
metagenomics
      v0.9.16
      Added explicit cast to avoid warning on some versions of gcc
      v0.9.17
      Implemented loading reference fm-index for graph-diff
      Stubbed in bwasw alignment of constructed haplotypes to reference
      Furhter implemented realignment of haplotypes to reference after 
discovery.
      Refactored code into HapgenUtil
      Implemented more helper functions for the hapgen process
      Initial merge and integration of Kees' dindel code
      Set the bamtools link flag in the case that bamtools is installed in a 
standard directory and --with-bamtools is not needed.
      Moved dindel code into a new function in GraphCompare
      Implemented testing variants with dindel separately for the normal/tumour
      Implemented function to get the edges of the de Bruijn graph from the 
FM-index using a single (forward) index. This can be used to cut the memory 
usage of some subprograms in half.
      Removed the  reverse FM-index from graph-diff which is no longer needed.
      Revised SampledSuffixArray to using a uint32_t to store the ids of the 
lexicographically sorted reads. This cuts the memory of the data structure in 
half but limits it to 2**32 strings.
      Reformatted some dindel code to fit the style of the codebase
      Added sga-asgq2dot.pl helper script to bin directory
      Converted tabs to spaces in sga-asgq2dot
      Removed some cruft from sga-asqg2dot
      Fixed bug in read pair extraction
      Split tumour/normal calls into separate vcf files.
      Added code to perform a fairly basic selection of the best alignment 
position from a set of possibilities
      Adding functionality to graph-diff to test existing variants passed in 
via a VCF file
      Refactored dindel calling code into DindelUtil
      The previous commits that changed the represtation of lexicographic index 
in the SampleSuffixArray broke binary compatability with previous files. 
Updated the magic number to catch these old binaries.
      Refactored more of the dindel wrapper code
      Added extra stats reporting to vcf tester
      Added extra information in the vcf testing mode of graph-diff to help 
understand why some variants were not found
      DindelRealign code now outputs to a stream instead of a file. Also, added 
more debug output in VCFTester
      More debugging output in VCFTester
      Removed some dindel assertions and changed the branch logic
      Made new dindel integration code thread safe and removed a bunch of prints
      Changed dindel assertion to a throw; modified the post-assembly walk 
finding algorithm to avoid performing enormous walks in the case that the graph 
has loops
      Changed another assertion to a warning/return code
      Merged Albert Villela's change which allows adding a suffix to each read 
ID in sga preprocess
      Throw an error when the homopolymer length check fails instead of 
printing a warning
      If the best candidate haplotype is to the reverse strand of the reference 
reverse-complement everything so the variants are on the right strand
      Trying version of code that relies on haplotype builder - kinda hacky
      Reverted back to using variation bubble builder in graph-diff
      Changed sga walk to remove contained reads and lingering transitive edges 
when --component-walks is specified.
      Fixed usage message for sga filter
      Skeleton code for new indexer
      In configure, specify -lbamtools in LIBS instead of LDFLAGS. This 
corrects the library link ordering and fixes the build in the case --as-needed 
is used in ld, as in newer versions of gcc.
      First semi-functional implementation of BCR for testing
      BCR cleanup
      BCR-constructed bwt is now written to disk.
      Started to integrate BCR into index. Made it more efficient by using 
2-bit encoded strings everywhere
      BCR algorithm now writes out the reverse index. Made the algorithm choice 
a command line argument
      Fully integrated BCR with the BWT disk algorithm
      Removed print from BCR
      Fixed bug where the wrong data types were being written/read in the ssa 
files.
      Implemented a number of debug/development functions for graph-diff
      Enabled reference based calling, fixed memory errors
      Was using the wrong base BWT in non-reference mode
      Added new debug mode to graph-diff
      Fixed two performance issues in sga assemble.
      Removed much debugging print statements
      Added a new exception to Dindel to handle the case where the variant 
found lies at the beginning of one of the haplotypes
      Fixed memory stomp in DindelHaplotype constructor
      Merging latest changes to the dindel haplotype model by Kees Albers.
      Temporarily re-enabled some debug prints
      Fixed configure script to properly handle bamtools include/lib paths in 
the case where it was installed with make install
      Bumped version to v0.9.18
      Added debug code for Kees
      Turned off debug mode
      Merge Kees' branch with bug fixes of variant that are not being called
      Added error message when a sequence with a given ID cannot be found in 
the input scaffold/contig collection
      Merge Kees' branch, with a fix so that variants are output with respect 
to the correct reference strand
      Added overdepth filter to avoid running dindel on super deep regions
      Implemented a simple counting-based variant caller for debugging
      Investigating poor variant calling performance on mouse genome data. 
Added a lightweight profiler.
      Revised profiler
      New heurestics to improve the running time when calling variants versus a 
reference genome
      Fixed bug in extractHaplotypeReads where it would incorrectly flag some 
haplotypes as being too deep.
      Merge branch 'graph-diff-v2' of /nfs/users/nfs_c/caa/source/sga_merge 
into graph-diff-v2
      Merging bug fix from Kees
      More debugging code
      More debugging code
      Modified sga-cluster-extend to warn instead of exit when some seed that 
is passed in is a substring of a read.
      graph-diff can now output multiple candidate haplotypes. Added a 
min-depth parameter to avoid traversing low-coverage k-mers.
      Implemented --longest-n parameter for sga-walk --component-walk.
      Integrate code from github.com/jts/misc
      Removed long-used Algorithm/ErrorCorrect code
      First implementation of new correction algorithm, which allows arbitrary 
overlaps between reads. Not for production use
      Refined the kmer matching portion of the overlap calculation for the new 
corrector.
      Updated 3rd party code with improvements to the overlapper and multiple 
alignment
      Changed parameters in consensus algorithm in new overlapper
      Updated third party code
      Integrated new overlap method which extends an existing alignment. 
Considerably faster than previous method.
      Updated third party code
      New overlap corrector will use the -r/--rounds parameter to iteratively 
correct reads. This can lead to better correction accuracy but decreases 
correction throughput
      Keep leading directories when parsing the reference filename
      sga-rmdup now writes out the number of copies of each sequence in the 
header line of the fasta file.
      Removed unused parameter in sga walk
      Changed the ASQG parser to only warn if the TE tag is not present instead 
of aborting
      Emit an error and exit when a vertex record is truncated.
      Make sure that edge records have the correct number of fields
      Changed warning message when the operating system is OSX
      Merge branch 'new-indexer'
      Added Illumina's notice regarding the rights to the BCR algorithm
      Fixed assertion tripped by short filenames when checking for gzip 
extension
      Filter the abyss-generated insert size histogram to avoid very long 
DistanceEst runtime.
      Added option to sga-filter to remove substring sequences only.
      v0.9.19
      Huge debugging hacks to investigate why we are missing some variants.
      Started to implement coherency-based haplotype builder.
      Merge branch 'master' into graph-diff-v3
      Merge branch 'it-correct' into graph-diff-v3
      First implementation of read-coherent haplotype generation. Way too slow 
so far.
      New "kmer witness" algorithm
      Removed hacked-in hardcoded path
      Removed debug output
      New method of deriving haplotypes from all the reads sharing a new kmer.
      More conservative generation of haplotypes.
      Rewrote method of inferring haplotypes from read coherent kmers
      Read coherent haplotype builder now extends to new variant kmers
      Parameter tweaks
      Merged Kees' latest code.
      Improved version of the haplotype generator. Lots of testing/debug code 
still in this version
      Allow singleton kmers to recruit new reads
      Check if quality string exists before adjusting for removed adapters
      Integrated new multiple alignment code
      Aggressively collapse conflicting bases during haplotype construction. 
This is only temporary.
      First pass at overlap-based haplotype builder. This version is not 
functional
      Extremely crude version of inexact string graph haplotyping code
      Reverted to old overlap code. Not functional
      New version of the variant algorithm that is based on inexact overlaps
      Fixed bug where empty initial haplotypes caused a crash
      Tweaks to RCHB
      First pass of string-graph based haplotype builder. Slow!
      Refactored the big k-mer based overlapper into a new file. Optimized some 
functions in OverlapHaplotypeBuilder
      Abort graph construction if no corrected read contains the initial kmer.
      Temporarily disabled constraint requiring complete construction of 
parallel bubble
      Set up a kmer->vertex map to avoid huge computation inserting reads into 
the graph.
      Stop the graph extension if there are too many tips in the graph.
      various optimizations to improve the running time of the overlap 
constructor
      Changed std::map/std::set into hashmap/hashset
      Save kmer indices instead of actual kmers sequences
      Moved overlap parameter into the parameters object
      New k-mer based haplotype to reference alignment, more restrictive 
haplotype assembly
      Suppress construction of parallel haplotype
      Extension vertices now labelled with the direction of extension
      Separated walk candidates into left/right join positions
      Avoid attempting to build covering paths when there are unambiguous 
chains of join vertices
      Printing changes only.
      Recursively trim tips from the graph
      Min overlap length is now a command line parameter
      Only extend the graph in one direction - this removes the effect of "back 
bubbles" making the graph too complex to resolve
      Revised trimming logic so it does not iteratively trim the whole branch. 
Testing lower correction kmer.
      More restrictive alignment
      Cap the number of differences to the reference genome at 8
      sga-cluster extend mode can now be limited to a maximum number of 
iterations
      Allow the user to define a set of sequences that are used to stop 
extension in sga-cluster
      Enabled the de Bruijn graph based QC check of candidate haplotypes
      Debug code, synching with Kees
      Refined the StringThread correction method
      Implemented a less restrictive check for when the graphs converge
      Re-enabled haplotype QC
      Set a positive default value for the minimum contig length in 
sga-bam2de.pl
      Do not pass -s to DistanceEst twice
      New haplotype QC which counts the number of branches off a haplotype. Not 
used, just for information
      Fixed infinite loop in haplotype QC for short haplotypes
      Merge branch 'graph-diff-v4' of 
/nfs/users/nfs_c/caa/source/sga-graph-diff-v4-copy into graph-diff-v4
      Fixed assertion when a haplotype aligned to the end of a chromosome
      Fixed error in the warning when a kmer threshold cannot be found for 
error correction
      Refactoring the variant calling code into its own directory
      Removed abandoned class
      More refactoring
      Removed dead code
      Renamed VariationBubbleBuilder to VariationBuilderCommon
      Refactored BuilderCommon code into VariationBuilderCommon
      Allow scaffolds to contain full IUPAC ambiguity codes
      Added new flag to SeqReader to avoid changing lower case bases to upper
      Skip all IUPAC codes when finding anchors for gap filling
      Refactored kmer masking code into a separate function
      Refactoring GraphCompare
      Added BWTIndexSet container. Massive refactoring of code to use it.
      Refactored de Bruijn haplotype builder into a new file
      Removed old debug code
      Refactored haplotype QC into its own function
      Moved HapgenUtil
      The algorithm used during haplotype assembly (dbg vs string graph) is a 
command line option
      Cleaned up parameters
      More option cleanup
      Allow sga-deinterleave.pl to read gzipped files.
      --debruijn should not take a parameter
      If one haplotype fails QC, do not attempt to assemble a variant
      Cleaned up prints in OverlapHaplotypeBuilder
      Started to implement multi-sample calling
      Integrated the read groups into DindelUtil code
      Added assertion warning for mate-pair mode
      Minor formatting change
      Clean up assertions so empty BWTs can be written after filtering
      When resolving scaffold gaps over ambiguity codes, the flanking sequence 
of the filled gap may not match that of the scaffold anymore.
      Version v0.9.20
      Fix substring assertion when null strings are passed to calculateDustScore
      Removed debug prints
      Use the new version of BEETL. Rewrote convert-beetl to use far less 
memory.
      sga-beetl-index.pl now converts fastq to fasta
      sga-beetl-index now has a no-convert option
      sga-merge will not merge population indices
      sga-merge uses the full path to the input files so they do not need to be 
in the working directory
      Merge /nfs/users/nfs_c/caa/source/sga-graph-diff-refactor into 
graph-diff-refactor
      Use full path to indices
      Use correct file status for popidx
      Attempt to fix assertion when counting homopolymer lengths
      Tweaked read extraction parameters, cleaned up some debug output
      Fix the way that homopolymer runs are counted
      Lowered default storage level for merging BWTs during sga-index
      Move out of bounds check to inside loop
      Early exit from the overlap function when the input read is shorter than 
min_overlap
      Added subprogram to evaluate how well we can detect mutations with k-mers
      Build parallel haplotypes and reduce the mapping kmer size
      Added homopolymer filter. Dindel code now outputs VCFRecords
      sga-graphdiff now directly outputs the final set of calls.
      Fixed compiler warning
      Made haplotype QC less strict
      Disable homopolymer filter
      Reverted to using a 31-mer for mapping. Lowered MAX_READS.
      Fixed cluster extend --iterations option so it properly extends for N 
rounds, not to N reads
      added new option to scaffold2fasta --write-names, which outputs the names 
of the contigs that make up the scaffold
      Reversed order of contigs when building scaffolds from right-to-left
      Fixed missing ID for singleton scaffolds
      Apply a minimum coverage of 2 during haplotype QC in non-ref mode
      Require at least two occurrences in the base sequence when making 
comparative calls
      temporary hack to fix crash when trying to extract pairs of reads from an 
unpaired index
      scaffold2fasta: write the orientation of the contigs when --write-names 
is specified
      No longer extracts read mates. Cleaned up prints
      Merge branch 'graph-diff-refactor'
      v0.9.3
      Fixed how the name of the BWT file is computed when the fastq file is 
gzipped
      When merging, if input reads are fastq/gzipped, write to the same
      Accept .fq as a fastq file extension when choosing the output name for 
sga-merge
      Fixed unused variable warning
      issue 21: merge fails when filename contains a '.' that is not part of 
the file extension. fixed by more careful handling of gzipped suffixes
      Fixed GCC 4.6 warnings
      Started implementation of quality scores for variant calling pipeline
      Integrated quality scores into the variant calling pipeline
      Set GraphCompare verbosity to the user's requested value
      Integrating Heng Li's ropebwt code
      Wrote a new VCFCollections wrapper to pass sample names to Dindel
      BWT writing now function in ropebwt
      Merge branch 'master' into quality-scores
      ropebwt: .sai file is now written, reversed index can be constructed
      Merge branch 'ropebwt'
      Ropebwt algorithm now uses the command line threading parameter
      v0.9.31
      Merge branch 'master' of github.com:jts/sga
      Updated sga-index help text
      Update README to credit Heng's ropebwt implementation
      Updated examples to use ropebwt
      Merge branch 'master' of /nfs/users/nfs_c/caa/source/sga-basequals
      Perform semi-global haplotype realignment within dindel
      Merge branch 'quality-scores'
      Using quality scores is now a command line option
      Merge branch 'master' of /nfs/users/nfs_c/caa/source/sga-basequals
      Removed print
      Reverted to global alignment for haplotype-haplotype alignments
      v0.9.32
      When removing transitive edges from the scaffold graph, check the 
orientation of contigs in the layout
      More aggressive cycle detection/removal in --strict mode of sga scaffolder
      Cleanup code and comments
      Merge branch 'master' of github.com:jts/sga
      v0.9.33
      Explicitly construct the .sai file when using ropebwt since you cannot 
get the lexicographic index from ropebwt when read lengths vary
      v0.9.34
      Whitespace changes
      Fixed compilation warnings pointed out by Zhang Feng.
      Add a dependency check to bam2de.pl and set the --mind parameter
      Change default min distance to -99 bases
      Set the --mina option to abyss
      Fix divide-by-zero in sga-astat
      github issue 25: implement writing orphaned pairs to a file during 
preprocess
      github issue 27: added --no-primer-check option to preprocess. Also, 
cleaned up help message.
      github issue 26: Removed references to old --quality-scale parameter
      github issue 14: sga index should exit gracefully when the input file is 
empty.
      When building the FM-index using ropebwt the lexicographic index is built 
using openmp if the compiler supports it.
      Implemented an upper limit on the number of edges we allow a vertex to 
have before giving up on using it in the assembly graph.
      v0.9.35
      Added namespace to fix compile error on OSX

Jason Stajich (1):
      Seems like sga-align could be run with threads so that bwa uses 
multithreaded to be faster.  Is there any reason not to do this?

Nathan S. Watson-Haigh (8):
      Just need to specify base name of the FASTA files.
      Merge branch 'master' of git://github.com/jts/sga
      Merge branch 'master' of git://github.com/jts/sga
      Consistently display help when no command arguments are given.
      Added support for .f and .r read pair suffixes found in reads output by 
sff_extract version < 0.3.0.
      Send info about which file is being processed to STDOUT.
      Additional info (algorithm used) sent to STDOUT when using SAIS - this is 
to be consistent with the output when BCR is used.
      Merge remote-tracking branch 'upstream/master'

Shaun Jackman (1):
      ld_set is static inline. Closes #29

jts (8):
      Merge pull request #6 from avilella/master
      Merge pull request #7 from avilella/master
      Merge pull request #11 from hyphaltip/patch-1
      Merge pull request #12 from drio/master
      Merge pull request #19 from mh11/77a4bc7aacd7b838fc3097af515e2f27ffecde3e
      Merge pull request #20 from nathanhaigh/master
      Merge pull request #22 from nathanhaigh/master
      Merge pull request #30 from sjackman/patch-1

mh11 (3):
      Allow to build FWD and REV index separately to improve speed
      Supress merging of indexes / sequence files
      Enable suppression of index creation also for memory only run + code 
formatting (spaces)

-----------------------------------------------------------------------

No new revisions were added by this update.

-- 
Alioth's /usr/local/bin/git-commit-notice on 
/srv/git.debian.org/git/debian-med/sga.git

_______________________________________________
debian-med-commit mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit

[med-svn] [sga] annotated tag upstream/0.10.13 created (now dffa83c)

Reply via email to