Author: tille Date: 2015-02-18 14:38:04 +0000 (Wed, 18 Feb 2015) New Revision: 18790
Modified: trunk/packages/vsearch/trunk/debian/patches/manpage_syntax.patch Log: Hmmm, upstream is actively maintaining the man page so this patch does not make much sense without upstream support. Just left the working chunks and droped the conflicting ones - feel free to enhance. Modified: trunk/packages/vsearch/trunk/debian/patches/manpage_syntax.patch =================================================================== --- trunk/packages/vsearch/trunk/debian/patches/manpage_syntax.patch 2015-02-17 07:54:41 UTC (rev 18789) +++ trunk/packages/vsearch/trunk/debian/patches/manpage_syntax.patch 2015-02-18 14:38:04 UTC (rev 18790) @@ -4,105 +4,7 @@ --- a/doc/vsearch.1 +++ b/doc/vsearch.1 -@@ -9,58 +9,58 @@ vsearch \(em chimera detection, clusteri - .ad l - Chimera detection: - .RS --\fBvsearch\fR --uchime_denovo \fIfastafile\fR (--chimeras | ----nonchimeras | --uchimealns | --uchimeout) \fIoutputfile\fR -+\fBvsearch\fR \-\-uchime_denovo \fIfastafile\fR (\-\-chimeras | -+\-\-nonchimeras | \-\-uchimealns | \-\-uchimeout) \fIoutputfile\fR - [\fIoptions\fR] - .PP --\fBvsearch\fR --uchime_ref \fIfastafile\fR (--chimeras | --nonchimeras --| --uchimealns | --uchimeout) \fIoutputfile\fR --db \fIfastafile\fR -+\fBvsearch\fR \-\-uchime_ref \fIfastafile\fR (\-\-chimeras | \-\-nonchimeras -+| \-\-uchimealns | \-\-uchimeout) \fIoutputfile\fR \-\-db \fIfastafile\fR - [\fIoptions\fR] - .PP - .RE - Clustering: - .RS --\fBvsearch\fR (--cluster_fast | --cluster_size | --cluster_smallmem) --\fIfastafile\fR (--alnout | --blast6out | --centroids | --clusters | ----msaout | --uc | --userout) \fIoutputfile\fR --id \fIreal\fR -+\fBvsearch\fR (\-\-cluster_fast | \-\-cluster_size | \-\-cluster_smallmem) -+\fIfastafile\fR (\-\-alnout | \-\-blast6out | \-\-centroids | \-\-clusters | -+\-\-msaout | \-\-uc | \-\-userout) \fIoutputfile\fR \-\-id \fIreal\fR - [\fIoptions\fR] - .PP - .RE - Dereplication: - .RS --\fBvsearch\fR --derep_fulllength \fIfastafile\fR (--output | --uc) -+\fBvsearch\fR \-\-derep_fulllength \fIfastafile\fR (\-\-output | \-\-uc) - \fIoutputfile\fR [\fIoptions\fR] - .PP - .RE - Masking: - .RS --\fBvsearch\fR --maskfasta \fIfastafile\fR --output \fIoutputfile\fR -+\fBvsearch\fR \-\-maskfasta \fIfastafile\fR \-\-output \fIoutputfile\fR - [\fIoptions\fR] - .PP - .RE - Pairwise alignment: - .RS --\fBvsearch\fR --allpairs_global \fIfastafile\fR (--alnout | ----blast6out | --matched | --notmatched | --uc | --userout) --\fIoutputfile\fR (--acceptall | --id \fIreal\fR) [\fIoptions\fR] -+\fBvsearch\fR \-\-allpairs_global \fIfastafile\fR (\-\-alnout | -+\-\-blast6out | \-\-matched | \-\-notmatched | \-\-uc | \-\-userout) -+\fIoutputfile\fR (\-\-acceptall | \-\-id \fIreal\fR) [\fIoptions\fR] - .PP - .RE - Searching: - .RS --\fBvsearch\fR --usearch_global \fIfastafile\fR --db \fIfastafile\fR --(--alnout | --blast6out | --uc | --userout) \fIoutputfile\fR --id -+\fBvsearch\fR \-\-usearch_global \fIfastafile\fR \-\-db \fIfastafile\fR -+(\-\-alnout | \-\-blast6out | \-\-uc | \-\-userout) \fIoutputfile\fR \-\-id - \fIreal\fR [\fIoptions\fR] - .PP - .RE - Shuffling: - .RS --\fBvsearch\fR --shuffle \fIfastafile\fR --output \fIoutputfile\fR -+\fBvsearch\fR \-\-shuffle \fIfastafile\fR \-\-output \fIoutputfile\fR - [\fIoptions\fR] - .PP - .RE - Sorting: - .RS --\fBvsearch\fR (--sortbylength | --sortbysize) \fIfastafile\fR --output -+\fBvsearch\fR (\-\-sortbylength | \-\-sortbysize) \fIfastafile\fR \-\-output - \fIoutputfile\fR [\fIoptions\fR] - .PP - .RE -@@ -107,10 +107,10 @@ present. All other ascii or non-ascii ch - complained about in a non-blocking warning message. - .PP - \fBvsearch\fR operations are case insensitive, except when soft masking is --activated. For --usearch_global (searching), --cluster_fast and ----cluster_smallmem (clustering), and --maskfasta (masking) commands, -+activated. For \-\-usearch_global (searching), \-\-cluster_fast and -+\-\-cluster_smallmem (clustering), and \-\-maskfasta (masking) commands, - the case is important if soft masking is used. Soft masking is --specified with the options "--dbmask soft" (for searching) or "--qmask -+specified with the options "\-\-dbmask soft" (for searching) or "\-\-qmask - soft" (for searching, clustering and masking). When using soft - masking, lower case letters indicate masked symbols, while upper case - letters indicate regular symbols. Masked symbols are never included in -@@ -121,7 +121,7 @@ in result files. - When comparing sequences during chimera detection, dereplication, - searching and clustering, T and U are considered identical, regardless - of their case. If two symbols are non-identical, their alignment will --result in the negative mismatch score (default -4), except if one or -+result in the negative mismatch score (default \-4), except if one or - both of the symbols are ambiguous (RYSWKMDBHVN) in which case the - score is zero. Alignment of two identical ambiguous symbols (e.g. R vs - R) also receives a score of zero. -@@ -138,27 +138,27 @@ searching). We start with general option +@@ -137,27 +137,27 @@ searching). We start with general option General options: .RS .TP 9 @@ -136,7 +38,7 @@ Do not truncate sequence labels at first space, use the full header in output files. .RE -@@ -168,7 +168,7 @@ Chimera detection options: +@@ -167,7 +167,7 @@ Chimera detection options: .PP .RS Chimera detection is based on a scoring function controlled by five @@ -145,9 +47,9 @@ sorted by decreasing abundance (if available), and compared on their \fIplus\fR strand only (case insensitive). .PP -@@ -176,12 +176,12 @@ In \fIde novo\fR mode, input fasta file +@@ -175,12 +175,12 @@ In \fIde novo\fR mode, input fasta file annotations (pattern [;]size=\fIinteger\fR[;] in the fasta - header). The input order influences the chimera detection, we + header). The input order influences the chimera detection, so we recommend to sort sequences by decreasing abundance (default of ---derep_fulllength command). If your sequence set needs to be sorted, -please see the --sortbysize command in the sorting section. @@ -160,106 +62,8 @@ +.BI \-\-abskew \0real +When using \-\-uchime_denovo, the abundance skew is used to distinguish in a 3-way alignment which sequence is the chimera and which are the - parents. The assumption is that chimeras appeared later in the PCR + parents. The assumption is that chimeras appear later in the PCR amplification process and are therefore less abundant than their -@@ -189,75 +189,75 @@ parents. The default value is 2.0, which - be at least 2 times more abundant than their chimera. Any positive - value greater than 1.0 can be used. - .TP --.BI --alignwidth\~ "positive integer" --Width of 3-way alignments in --uchimealns output. The default value is -+.BI \-\-alignwidth\~ "positive integer" -+Width of 3-way alignments in \-\-uchimealns output. The default value is - 80. Set to 0 to eliminate wrapping. - .TP --.BI --chimeras \0filename -+.BI \-\-chimeras \0filename - Output chimeric sequences to \fIfilename\fR, in fasta format. Output - order may vary when using multiple threads. - .TP --.BI --db \0filename --When using --uchime_ref, detect chimeras using the fasta-formatted -+.BI \-\-db \0filename -+When using \-\-uchime_ref, detect chimeras using the fasta-formatted - reference sequences contained in \fIfilename\fR. Reference sequences - are assumed to be chimera-free. Chimeras will not be detected if their - parents (or sufficiently close relatives) are not present in the - database. - .TP --.BI --dn \0real -+.BI \-\-dn \0real - No vote pseudo-count (parameter \fIn\fR in the chimera scoring - function) (1.4). - .TP --.BI --mindiffs\~ "positive integer" -+.BI \-\-mindiffs\~ "positive integer" - Minimum number of differences per segment (3). - .TP --.BI --mindiv \0real -+.BI \-\-mindiv \0real - Minimum divergence from closest parent (0.8). - .TP --.BI --minh \0real -+.BI \-\-minh \0real - Minimum score (h). Increasing this value tends to reduce the number of - false positives and to decrease sensitivity. Default value is - 0.28. (value ranging from 0.0 to 1.0 included). - .TP --.BI --nonchimeras \0filename -+.BI \-\-nonchimeras \0filename - Output non-chimeric sequences to \fIfilename\fR, in fasta - format. Output order may vary when using multiple threads. - .TP --.B --self --When using --uchime_ref, ignore a reference sequence when its label -+.B \-\-self -+When using \-\-uchime_ref, ignore a reference sequence when its label - matches the label of the query sequence (useful to estimate - false-positive rate in reference sequences). - .TP --.B --selfid --When using --uchime_ref, ignore a reference sequence when its -+.B \-\-selfid -+When using \-\-uchime_ref, ignore a reference sequence when its - nucleotide sequence is strictly identical with the query sequence. - .TP --.BI --threads\~ "positive integer" -+.BI \-\-threads\~ "positive integer" - Number of computation threads to use (1 to 256) with uchime_ref. - The number of threads - should be lesser or equal to the number of available CPU cores. The - default is to launch one thread per available logical core. - .TP --.BI --uchime_denovo \0filename -+.BI \-\-uchime_denovo \0filename - Detect chimeras present in the fasta-formatted \fIfilename\fR, without - external references (i.e. \fIde novo\fR). Automatically sort the - sequences in \fIfilename\fR by decreasing abundance - beforehand. Multithreading is not supported. - .TP --.BI --uchime_ref \0filename -+.BI \-\-uchime_ref \0filename - Detect chimeras present in the fasta-formatted \fIfilename\fR by --comparing them with reference sequences (option --db). Multithreading -+comparing them with reference sequences (option \-\-db). Multithreading - is supported. - .TP --.BI --uchimealns \0filename -+.BI \-\-uchimealns \0filename - Write 3-way global alignments (parentA, parentB, chimera) to --\fIfilename\fR using a human-readable format. Use --alignwidth to modify -+\fIfilename\fR using a human-readable format. Use \-\-alignwidth to modify - alignment length. Output order may vary when using multiple threads. - .TP --.BI --uchimeout \0filename -+.BI \-\-uchimeout \0filename - Write chimera detection results to \fIfilename\fR using the uchime - tab-separated format of 18 fields (see the list below). Use ----uchimeout5 to use a format compatible with usearch v5 and earlier -+\-\-uchimeout5 to use a format compatible with usearch v5 and earlier - versions. Rows output order may vary when using multiple threads. - .RS - .RS @@ -272,7 +272,7 @@ A: parent A sequence label. B: parent B sequence label. .IP \n+[step]. @@ -269,238 +73,8 @@ .IP \n+[step]. idQM: percentage of similarity of query (Q) and model (M) constructed as a part of parent A and a part of parent B. -@@ -304,12 +304,12 @@ YN: query is chimeric (Y), or not (N), o - .RE - .RE - .TP --.B --uchimeout5 --When using --uchimeout, write chimera detection results using a --tab-separated format of 17 fields (drop the 5th field of --uchimeout), -+.B \-\-uchimeout5 -+When using \-\-uchimeout, write chimera detection results using a -+tab-separated format of 17 fields (drop the 5th field of \-\-uchimeout), - compatible with usearch version 5 and earlier versions. - .TP --.BI --xn \0real -+.BI \-\-xn \0real - No vote weight (parameter beta) (8.0). - .RE +@@ -502,9 +502,9 @@ Masking options: .PP -@@ -320,53 +320,53 @@ Clustering options: - \fBvsearch\fR implements a single-pass, greedy star-clustering - algorithm, similar to the algorithms implemented in usearch, DNAclust - and sumaclust. Important parameters are the global clustering --threshold (--id) and the pairwise identity definition (--iddef). -+threshold (\-\-id) and the pairwise identity definition (\-\-iddef). - .TP 9 --.BI --centroids \0filename -+.BI \-\-centroids \0filename - Output cluster centroid sequences to \fIfilename\fR file, in fasta - format. The centroid is the sequence that seeded the cluster (i.e. the - first sequence of the cluster). - .TP --.BI --cluster_fast \0filename -+.BI \-\-cluster_fast \0filename - Clusterize the fasta sequences in \fIfilename\fR, automatically - perform a sorting by decreasing sequence length beforehand. - .TP --.BI --cluster_size \0filename -+.BI \-\-cluster_size \0filename - Clusterize the fasta sequences in \fIfilename\fR, automatically - perform a sorting by decreasing sequence abundance beforehand. - .TP --.BI --cluster_smallmem \0filename -+.BI \-\-cluster_smallmem \0filename - Clusterize the fasta sequences in \fIfilename\fR without automatically - modifying their order beforehand. Sequence are expected to be sorted --by decreasing sequence length, unless --usersort is used. -+by decreasing sequence length, unless \-\-usersort is used. - .TP --.BI --clusters \0string -+.BI \-\-clusters \0string - Output each cluster to a separate fasta file using the prefix - \fIstring\fR and a ticker (0, 1, 2, etc.) to construct the path and filenames. - .TP --.BI --consout \0filename -+.BI \-\-consout \0filename - Output cluster consensus sequences to \fIfilename\fR. For each - cluster, a multiple alignment is computed, and a consensus sequence is - constructed by taking the majority symbol (nucleotide or gap) from - each column of the alignment. Columns containing a majority of gaps --are skipped, except for terminal gaps. Use --construncate to take -+are skipped, except for terminal gaps. Use \-\-construncate to take - terminal gaps into account (not implemented yet). - .\" .TP --.\" .B --construncate --.\" when using the --consout option to build consensus sequences, do not -+.\" .B \-\-construncate -+.\" when using the \-\-consout option to build consensus sequences, do not - .\" ignore terminal gaps. That option skips terminal columns if they - .\" contain a majority of gaps, yielding shorter consensus sequences than --.\" when using --consout alone. -+.\" when using \-\-consout alone. - .TP --.BI --id \0real -+.BI \-\-id \0real - Do not add the target to the cluster if the pairwise identity with the - centroid is lower than \fIreal\fR (value ranging from 0.0 to 1.0 - included). The pairwise identity is defined as the number of (matching - columns) / (alignment length - terminal gaps). That definition can be --modified by --iddef. -+modified by \-\-iddef. - .TP --.BI --iddef\~ "0|1|2|3|4" --Change the pairwise identity definition used in --id. Values accepted -+.BI \-\-iddef\~ "0|1|2|3|4" -+Change the pairwise identity definition used in \-\-id. Values accepted - are: - .RS - .RS -@@ -381,68 +381,68 @@ edit distance excluding terminal gaps (d - Marine Biological Lab definition counting each extended gap as a - single difference. - .IP \n+[step]. --BLAST definition, equivalent to --iddef 2 in a context of global -+BLAST definition, equivalent to \-\-iddef 2 in a context of global - pairwise alignment. - .RE - .RE - .TP --.BI --msaout \0filename -+.BI \-\-msaout \0filename - Output a multiple sequence alignment and a consensus sequence for each - cluster to \fIfilename\fR, in fasta format. The consensus sequence is - constructed by taking the majority symbol (nucleotide or gap) from - each column of the alignment. Columns containing a majority of gaps - are skipped, except for terminal gaps. - .TP --.BI --qmask\~ "none|dust|soft" -+.BI \-\-qmask\~ "none|dust|soft" - Mask simple repeats and low-complexity regions in sequences using the - \fIdust\fR or the \fIsoft\fR algorithms, or do not mask - (\fInone\fR). Warning, when using \fIsoft\fR masking, clustering - becomes case sensitive. The default is to mask using \fIdust\fR. - .TP --.B --sizein -+.B \-\-sizein - Take into account the abundance annotations present in the input fasta - file (search for the pattern "[>;]size=\fIinteger\fR[;]" in sequence - headers). - .TP --.B --sizeout -+.B \-\-sizeout - Add abundance annotations to the output fasta files (add the pattern --";size=\fIinteger\fR;" to sequence headers). If --sizein is specified, -+";size=\fIinteger\fR;" to sequence headers). If \-\-sizein is specified, - abundance annotations are reported to output files, and each cluster - centroid receives a new abundance value corresponding to the total --abundance of the amplicons included in the cluster (--centroids --option). If --sizein is not specified, input abundances are set to 1 -+abundance of the amplicons included in the cluster (\-\-centroids -+option). If \-\-sizein is not specified, input abundances are set to 1 - for amplicons, and to the number of amplicons per cluster for - centroids. - .TP --.BI --strand\~ "plus|both" -+.BI \-\-strand\~ "plus|both" - When comparing sequences with the cluster seed, check the \fIplus\fR - strand only (default) or check \fIboth\fR strands. - .TP --.BI --threads\~ "positive integer" -+.BI \-\-threads\~ "positive integer" - Number of computation threads to use (1 to 256). The number of threads - should be less or equal to the number of available CPU cores. The - default is to launch one thread per available logical core. - .TP --.BI --uc \0filename -+.BI \-\-uc \0filename - Output clustering results in \fIfilename\fR using a uclust-like - format. See <http://www.drive5.com/usearch/manual/ucout.html> for a - description of the format. - .TP --.B --usersort --When using --cluster_smallmem, allow any sequence input order, not -+.B \-\-usersort -+When using \-\-cluster_smallmem, allow any sequence input order, not - just a decreasing length ordering. - .TP - Most searching options also apply to clustering: - .br ----alnout, --blast6out, --userout, --userfields, --fastapairs, --matched, ----notmatched, --maxaccept, --maxreject, score filtering, gap penalties, masking. (see the Searching section). -+\-\-alnout, \-\-blast6out, \-\-userout, \-\-userfields, \-\-fastapairs, \-\-matched, -+\-\-notmatched, \-\-maxaccept, \-\-maxreject, score filtering, gap penalties, masking. (see the Searching section). - .RE - .PP - .\" ---------------------------------------------------------------------------- - Dereplication options: - .RS - .TP 9 --.BI --derep_fulllength \0filename -+.BI \-\-derep_fulllength \0filename - Merge strictly identical sequences contained in - \fIfilename\fR. Identical sequences are defined as having the same - length and the same string of nucleotides (case insensitive, T and U -@@ -450,46 +450,46 @@ are considered the same). As \fBvsearch\ - \fIfilename\fR twice, \fIfilename\fR must be a real file, not a - stream. - .TP --.BI --maxuniquesize\~ "positive integer" -+.BI \-\-maxuniquesize\~ "positive integer" - Discard sequences with an abundance value greater than \fIinteger\fR. - .TP - .BI --minuniquesize\~ "positive integer" - Discard sequences with an abundance value smaller than \fIinteger\fR. - .TP --.BI --output \0filename -+.BI \-\-output \0filename - Write the dereplicated sequences to \fIfilename\fR, in fasta format - and sorted by decreasing abundance. Identical sequences receive the --header of the first sequence of their group. If --sizeout is used, the -+header of the first sequence of their group. If \-\-sizeout is used, the - number of occurrences (i.e. abundance) of each sequence is indicated - at the end of their fasta header using the pattern - ";size=\fIinteger\fR;". - .TP --.B --sizein -+.B \-\-sizein - Take into account the abundance annotations present in the input fasta - file (search for the pattern "[>;]size=\fIinteger\fR[;]" in sequence - headers). - .TP --.B --sizeout -+.B \-\-sizeout - Add abundance annotations to the output fasta file (add the pattern --";size=\fIinteger\fR;" to sequence headers). If --sizein is specified, -+";size=\fIinteger\fR;" to sequence headers). If \-\-sizein is specified, - each unique sequence receives a new abundance value corresponding to - its total abundance (sum of the abundances of its occurrences). If ----sizein is not specified, input abundances are set to 1, and each -+\-\-sizein is not specified, input abundances are set to 1, and each - unique sequence receives a new abundance value corresponding to its - number of occurrences in the input file. - .TP --.BI --strand\~ "plus|both" -+.BI \-\-strand\~ "plus|both" - When searching for strictly identical sequences, check the \fIplus\fR - strand only (default) or check \fIboth\fR strands. - .TP --.BI --topn\~ "positive integer" -+.BI \-\-topn\~ "positive integer" - Output only the top \fIinteger\fR sequences (i.e. the most abundant). - .TP --.BI --uc \0filename -+.BI \-\-uc \0filename - Output dereplication results in \fIfilename\fR using a uclust-like - format. See <http://www.drive5.com/usearch/manual/ucout.html> for a - description of the format. In the context of dereplication, the option ----uc_allhits has no effect. -+\-\-uc_allhits has no effect. - .RE - .PP - .\" ---------------------------------------------------------------------------- -@@ -498,9 +498,9 @@ Masking options: - .PP An input sequence can be composed of lower- or uppercase nucleotides. Lowercase nucleotides are silently set to uppercase -before masking, unless the --qmask soft option is used. Here are the @@ -512,7 +86,7 @@ lower and uppercase nucleotides: .PP .TS -@@ -518,24 +518,24 @@ soft:on:lowercase symbols masked and cha +@@ -522,24 +522,24 @@ soft:on:lowercase symbols masked and cha .TE .PP .TP 9 @@ -542,44 +116,8 @@ +.BI \-\-threads\~ "positive integer" Number of computation threads to use (1 to 256). The number of threads should be lesser or equal to the number of available CPU cores. The - default is to launch one thread per available logical core. -@@ -545,26 +545,26 @@ default is to launch one thread per avai - Pairwise alignment options: - .RS - .TP 9 --.BI --allpairs_global \0filename -+.BI \-\-allpairs_global \0filename - Perform optimal global pairwise alignments of all vs. all fasta - sequences contained in \fIfilename\fR. The results of the n * (n-1) / --2 alignments are written to the result files specified with --alnout, ----blast6out, --fastapairs --matched, --notmatched, --uc or --userout --(see Searching section below). Specify either the --acceptall option -+2 alignments are written to the result files specified with \-\-alnout, -+\-\-blast6out, \-\-fastapairs \-\-matched, \-\-notmatched, \-\-uc or \-\-userout -+(see Searching section below). Specify either the \-\-acceptall option - to output all pairwise alignments, or specify an identity level with ----id to discard weak alignments. Most other accept/reject options (see -+\-\-id to discard weak alignments. Most other accept/reject options (see - Searching options below) may also be used. Sequences are aligned on - their \fIplus\fR strand only. This command is multi-threaded. - .TP --.B --acceptall -+.B \-\-acceptall - Write the results of all alignments to output files. This option --overrides all other accept/reject options (e.g. --id). -+overrides all other accept/reject options (e.g. \-\-id). - .TP --.BI --id \0real -+.BI \-\-id \0real - Reject the sequence match if the pairwise identity is lower than - \fIreal\fR (value ranging from 0.0 to 1.0 included). - .TP --.BI --threads\~ "positive integer" -+.BI \-\-threads\~ "positive integer" - Number of computation threads to use (1 to 256). The number of threads - should be lesser or equal to the number of available CPU cores. The - default is to launch one thread per available logical core. -@@ -574,17 +574,17 @@ default is to launch one thread per avai + default is to use all available ressources and to launch one thread +@@ -582,17 +582,17 @@ per logical core. Searching options: .RS .TP 9 @@ -602,76 +140,10 @@ query+target+id+alnlen+mism+opens+qlo+qhi+tlo+thi+evalue+bits. A complete list and description is available in the section "Userfields" of this manual. -@@ -628,52 +628,52 @@ query. Nucleotide numbering starts from - there is no alignment. - .IP \n+[step]. - \fIevalue\fR: expectancy-value (not computed for nucleotide --alignments). Always set to -1. -+alignments). Always set to \-1. - .IP \n+[step]. - \fIbits\fR: bit score (not computed for nucleotide - alignments). Always set to 0. - .RE - .RE +@@ -714,12 +714,12 @@ default in usearch, all default scores a + \fBvsearch\fR have been doubled to maintain equivalent penalties and + to produce identical alignments. .TP --.BI --db \0filename --Compare query sequences (specified with --usearch_global) -+.BI \-\-db \0filename -+Compare query sequences (specified with \-\-usearch_global) - to the fasta-formatted target sequences contained in \fIfilename\fR, - using global pairwise alignment. - .TP --.BI --dbmask\~ "none|dust|soft" -+.BI \-\-dbmask\~ "none|dust|soft" - Mask simple repeats and low-complexity regions in target database - sequences using the \fIdust\fR or the \fIsoft\fR algorithms, or do not - mask (\fInone\fR). Warning, when using \fIsoft\fR masking search - commands become case sensitive. The default is to mask using - \fIdust\fR. - .TP --.BI --dbmatched \0filename -+.BI \-\-dbmatched \0filename - Write database target sequences matching at least one query sequence --to \fIfilename\fR, in fasta format. If the option --sizeout is used, -+to \fIfilename\fR, in fasta format. If the option \-\-sizeout is used, - the number of queries that matched each target sequence is indicated - using the pattern ";size=\fIinteger\fR;". - .TP --.BI --dbnotmatched \0filename -+.BI \-\-dbnotmatched \0filename - Write database target sequences not matching query sequences to - \fIfilename\fR, in fasta format. - .TP --.BI --fastapairs \0filename -+.BI \-\-fastapairs \0filename - Write pairwise alignments of query and target sequences to - \fIfilename\fR, in fasta format. - .TP --.B --fulldp -+.B \-\-fulldp - Dummy option. To maximize search sensitivity, \fBvsearch\fR uses a - 8-way 16-bit SIMD vectorized full dynamic programming algorithm --(Needleman-Wunsch), whether or not --fulldp is specified. -+(Needleman-Wunsch), whether or not \-\-fulldp is specified. - .TP --.BI --gapext \0string --Set penalties for a gap extension. See --gapopen for a complete -+.BI \-\-gapext \0string -+Set penalties for a gap extension. See \-\-gapopen for a complete - description of the penalty declaration system. The default is to - initialize the six gap extending penalties using a penalty of 2 for - extending internal gaps and a penalty of 1 for extending terminal - gaps, in both query and target sequences (i.e. 2I/1E). - .TP --.BI --gapopen \0string -+.BI \-\-gapopen \0string - Set penalties for a gap opening. A gap opening can occur in six - different contexts: in the query (Q) or in the target (T) sequence, at - the left (L) or right (R) extremity of the sequence, or inside the -@@ -704,12 +704,12 @@ gap penalties. Because the lowest gap pe - in usearch, all default scores and gap penalties in \fBvsearch\fR - have been doubled in order to obtain similar alignments. - .TP -.B --hardmask +.B \-\-hardmask Mask low-complexity regions by replacing them with Ns instead of @@ -683,7 +155,7 @@ Reject the sequence match if the pairwise identity is lower than \fIreal\fR (value ranging from 0.0 to 1.0 included). The search process sorts target sequences by decreasing number of \fIk\fR-mers -@@ -719,13 +719,13 @@ also prevent pairwise alignments with we +@@ -729,13 +729,13 @@ also prevent pairwise alignments with we there needs to be at least 6 shared \fIk\fR-mers to start the pairwise alignment, and at least one out of every 16 \fIk\fR-mers from the query needs to match the target. Consequently, using values lower than @@ -701,280 +173,7 @@ are: .RS .RS -@@ -735,40 +735,40 @@ CD-HIT definition using shortest sequenc - .IP \n+[step]. - edit distance. - .IP \n+[step]. --edit distance excluding terminal gaps (default value of --id). -+edit distance excluding terminal gaps (default value of \-\-id). - .IP \n+[step]. - Marine Biological Lab definition counting each extended gap as a - single difference. - .IP \n+[step]. --BLAST definition, equivalent to --iddef 2 in a context of global -+BLAST definition, equivalent to \-\-iddef 2 in a context of global - pairwise alignment. - .RE - .RE - .PP --The option --userfields accepts the fields id0 to id4, in addition to -+The option \-\-userfields accepts the fields id0 to id4, in addition to - the field id, to report the pairwise identity values corresponding to - the different definitions. - .TP --.BI --idprefix\~ "positive integer" -+.BI \-\-idprefix\~ "positive integer" - Reject the target sequence if the first \fIinteger\fR nucleotides do - not match the query sequence. - .TP --.BI --idsuffix\~ "positive integer" -+.BI \-\-idsuffix\~ "positive integer" - Reject the target sequence if the last \fIinteger\fR nucleotides do - not match the query sequence. - .TP --.B --leftjust -+.B \-\-leftjust - Reject the target sequence if the alignment begins with gaps. - .TP --.BI --match\~ "integer" -+.BI \-\-match\~ "integer" - Score assigned to a match (i.e. identical nucleotides) in the pairwise - alignment. The default value is 2. - .TP --.BI --matched \0filename -+.BI \-\-matched \0filename - Write query sequences matching database target sequences to - \fIfilename\fR, in fasta format. - .TP --.BI --maxaccepts\~ "positive integer" -+.BI \-\-maxaccepts\~ "positive integer" - Maximum number of hits to accept before stopping the search. The - default value is 1. This option works in pair with maxrejects. The - search process sorts target sequences by decreasing number of -@@ -779,31 +779,31 @@ and the search process stops for that qu - higher value, more hits are accepted. If maxaccepts and maxrejects are - both set to 0, the complete database is searched. - .TP --.BI --maxdiffs\~ "positive integer" -+.BI \-\-maxdiffs\~ "positive integer" - Reject the target sequence if the alignment contains at least - \fIinteger\fR substitutions, insertions or deletions. - .TP --.BI --maxgaps\~ "positive integer" -+.BI \-\-maxgaps\~ "positive integer" - Reject the target sequence if the alignment contains at least - \fIinteger\fR insertions or deletions. - .TP --.BI --maxhits\~ "positive integer" -+.BI \-\-maxhits\~ "positive integer" - Maximum number of hits to show once the search is terminated (hits are - sorted by decreasing identity). Unlimited by default value. \fBIt - applies to alnout, blast6out, uc, userout, fastapairs\fR. - .TP --.BI --maxid \0real -+.BI \-\-maxid \0real - Reject the target sequence if its percentage of identity with the - query is greater than \fIreal\fR. - .TP --.BI --maxqsize\~ "positive integer" -+.BI \-\-maxqsize\~ "positive integer" - Reject query sequences with an abundance greater than - \fIinteger\fR. - .TP --.BI --maxqt \0real -+.BI \-\-maxqt \0real - Reject if the query/target sequence length ratio is greater than \fIreal\fR. - .TP --.BI --maxrejects\~ "positive integer" -+.BI \-\-maxrejects\~ "positive integer" - Maximum number of non-matching target sequences to consider before - stopping the search. The default value is 32. This option works in - pair with maxaccepts. The search process sorts target sequences by -@@ -815,138 +815,138 @@ hit). If maxrejects is set to a higher v - are considered. If maxaccepts and maxrejects are both set to 0, the - complete database is searched. - .TP --.BI --maxsizeratio \0real -+.BI \-\-maxsizeratio \0real - Reject if the query/target abundance ratio is greater than - \fIreal\fR. - .TP --.BI --maxsl \0real -+.BI \-\-maxsl \0real - Reject if the shorter/longer sequence length ratio is - greater than \fIreal\fR. - .TP --.BI --maxsubs\~ "positive integer" -+.BI \-\-maxsubs\~ "positive integer" - Reject the target sequence if the alignment contains more than - \fIinteger\fR substitutions. - .TP --.BI --mid \0real -+.BI \-\-mid \0real - Reject the alignment if the percentage of identity is lower than - \fIreal\fR (ignoring all gaps, internal and terminal). - .TP --.BI --mincols\~ "positive integer" -+.BI \-\-mincols\~ "positive integer" - Reject the target sequence if the alignment length is shorter than - \fIinteger\fR. - .TP --.BI --minqt \0real -+.BI \-\-minqt \0real - Reject if the query/target sequence length ratio is lower than - \fIreal\fR. - .TP --.BI --minsizeratio \0real -+.BI \-\-minsizeratio \0real - Reject if the query/target abundance ratio is lower than \fIreal\fR. - .TP --.BI --minsl \0real -+.BI \-\-minsl \0real - Reject if the shorter/longer sequence length ratio is lower than - \fIreal\fR. - .TP --.BI --mintsize\~ "positive integer" -+.BI \-\-mintsize\~ "positive integer" - Reject target sequences with an abundance lower than \fIinteger\fR. - .TP --.BI --mismatch\~ "integer" -+.BI \-\-mismatch\~ "integer" - Score assigned to a mismatch (i.e. different nucleotides) in the --pairwise alignment. The default value is -4. -+pairwise alignment. The default value is \-4. - .TP --.BI --notmatched \0filename -+.BI \-\-notmatched \0filename - Write query sequences not matching database target sequences to - \fIfilename\fR, in fasta format. - .TP --.B --output_no_hits --Write both matching and non-matching queries to --alnout, --blast6out, --and --userout output files (--uc and --uc_allhits output files always -+.B \-\-output_no_hits -+Write both matching and non-matching queries to \-\-alnout, \-\-blast6out, -+and \-\-userout output files (\-\-uc and \-\-uc_allhits output files always - feature non-matching queries). Non-matching queries are labelled "No --hits" in --alnout files. -+hits" in \-\-alnout files. - .TP --.BI --qmask\~ "none|dust|soft" -+.BI \-\-qmask\~ "none|dust|soft" - Mask simple repeats and low-complexity regions in query sequences - using the \fIdust\fR or the \fIsoft\fR algorithms, or do not mask - (\fInone\fR). Warning, when using \fIsoft\fR masking search commands - become case sensitive. The default is to mask using \fIdust\fR. - .TP --.BI --query_cov \0real -+.BI \-\-query_cov \0real - Reject if the fraction of the query aligned to the target sequence is - lower than \fIreal\fR. The query coverage is computed as - (matches + mismatches) / query sequence length. Internal or terminal - gaps are not taken into account. - .TP --.B --rightjust -+.B \-\-rightjust - Reject the target sequence if the alignment ends with gaps. - .TP --.BI --rowlen\~ "positive integer" --Width of alignment lines in --alnout output. The default value is -+.BI \-\-rowlen\~ "positive integer" -+Width of alignment lines in \-\-alnout output. The default value is - 64. Set to 0 to eliminate wrapping. - .TP --.B --self -+.B \-\-self - Reject the alignment if the query and target labels are identical. - .TP --.B --selfid -+.B \-\-selfid - Reject the alignment if the query and target sequences are strictly - identical. - .TP --.B --sizeout --Add abundance annotations to the output of the option --dbmatched -+.B \-\-sizeout -+Add abundance annotations to the output of the option \-\-dbmatched - (using the pattern ";size=\fIinteger\fR;"). - .TP --.BI --strand\~ "plus|both" -+.BI \-\-strand\~ "plus|both" - When searching for similar sequences, check the \fIplus\fR strand only - (default) or check \fIboth\fR strands. - .TP --.BI --target_cov \0real -+.BI \-\-target_cov \0real - Reject if the fraction of the target sequence aligned to the query - sequence is lower than \fIreal\fR. The target coverage is computed as - (matches + mismatches) / target sequence length. - Internal or terminal gaps are not taken into account. - .TP --.BI --threads\~ "positive integer" -+.BI \-\-threads\~ "positive integer" - Number of computation threads to use (1 to 256). The number of threads - should be lesser or equal to the number of available CPU cores. The - default is to launch one thread per available logical core. - .TP --.B --top_hits_only -+.B \-\-top_hits_only - Output only the hits with the highest percentage of identity with the - query. - .TP --.BI --uc \0filename -+.BI \-\-uc \0filename - Output searching results in \fIfilename\fR using a uclust-like - format. See <http://www.drive5.com/usearch/manual/ucout.html> for a - description of the format. Output order may vary when using multiple - threads. - .TP --.B --uc_allhits --When using the --uc option, show all hits, not just the top hit for -+.B \-\-uc_allhits -+When using the \-\-uc option, show all hits, not just the top hit for - each query. - .TP --.BI --usearch_global \0filename --Compare target sequences (--db) to the fasta-formatted query sequences -+.BI \-\-usearch_global \0filename -+Compare target sequences (\-\-db) to the fasta-formatted query sequences - contained in \fIfilename\fR, using global pairwise alignment. - .TP --.BI --userfields \0string --When using --userout, select and order the fields written to the -+.BI \-\-userfields \0string -+When using \-\-userout, select and order the fields written to the - output file. Fields are separated by "+" (e.g. query+target+id). See - the "Userfields" section for a complete list of fields. - .TP --.BI --userout \0filename -+.BI \-\-userout \0filename - Write user-defined tab-separated output to \fIfilename\fR. Select the --fields with the option --userfields. Output order may vary when using --multiple threads. If --userfields is empty or not present, -+fields with the option \-\-userfields. Output order may vary when using -+multiple threads. If \-\-userfields is empty or not present, - \fIfilename\fR is empty. - .TP --.BI --weak_id \0real -+.BI \-\-weak_id \0real - Show hits with percentage of identity of at least \fIreal\fR, without - terminating the search. A normal search stops as soon as enough hits --are found (as defined by --maxaccepts, --maxrejects, and --id). As ----weak_id reports weak hits that are not deduced from --maxaccepts, --high --id values can be used, hence preserving both speed and -+are found (as defined by \-\-maxaccepts, \-\-maxrejects, and \-\-id). As -+\-\-weak_id reports weak hits that are not deduced from \-\-maxaccepts, -+high \-\-id values can be used, hence preserving both speed and - sensitivity. Logically, \fIreal\fR must be smaller than the value --indicated by --id. -+indicated by \-\-id. - .TP --.BI --wordlength\~ "positive integer" -+.BI \-\-wordlength\~ "positive integer" - Length of words (i.e. \fIk\fR-mers) for database indexing. The range - of possible values goes from 3 to 15, but values near 8 are generally - recommended. Longer words may reduce the sensitivity for weak -@@ -963,75 +963,75 @@ more). The default value is 8. +@@ -984,75 +984,75 @@ more). The default value is 8. Shuffling options: .RS .TP 9 @@ -1077,7 +276,7 @@ .RS .TP 9 .B aln -@@ -1052,7 +1052,7 @@ format (Compact Idiosyncratic Gapped Ali +@@ -1073,7 +1073,7 @@ format (Compact Idiosyncratic Gapped Ali (deletion) and I (insertion). Empty field if there is no alignment. .TP .B evalue @@ -1086,7 +285,7 @@ .TP .B exts Number of columns containing a gap extension (zero or positive integer -@@ -1088,7 +1088,7 @@ single difference. +@@ -1109,7 +1109,7 @@ single difference. .TP .B id4 BLAST definition of the percentage of identity (real value ranging @@ -1095,7 +294,7 @@ pairwise alignment. .TP .B ids -@@ -1129,7 +1129,7 @@ Internal or terminal gaps are not taken +@@ -1150,7 +1150,7 @@ Internal or terminal gaps are not taken field is set to 0.0 if there is no alignment. .TP .B qframe @@ -1104,7 +303,7 @@ is not computed by \fBvsearch\fR. Always set to +0. .TP .B qhi -@@ -1189,7 +1189,7 @@ Internal or terminal gaps are not taken +@@ -1209,7 +1209,7 @@ Internal or terminal gaps are not taken The field is set to 0.0 if there is no alignment. .TP .B tframe @@ -1113,7 +312,7 @@ is not computed by \fBvsearch\fR. Always set to +0. .TP .B thi -@@ -1240,31 +1240,31 @@ quirks and inconsistencies. We decided n +@@ -1259,31 +1259,31 @@ quirks and inconsistencies. We decided n and for complete transparency, to document here the deliberate changes we made. .PP @@ -1152,9 +351,9 @@ +\fBvsearch\fR extends the \-\-sizein option to dereplication +(\-\-derep_fulllength) and clustering (\-\-cluster_fast). .PP - \fBvsearch\fR treats T and U as identical nucleotides for + \fBvsearch\fR treats T and U as identical nucleotides during dereplication. -@@ -1296,8 +1296,8 @@ Cluster with a 97% similarity threshold, +@@ -1333,8 +1333,8 @@ Cluster with a 97% similarity threshold, and write cluster descriptions using a uclust-like format: .PP .RS @@ -1165,7 +364,7 @@ .RE .PP Dereplicate the sequences contained in queries.fas, take into account -@@ -1306,9 +1306,9 @@ to output with the new abundance informa +@@ -1343,9 +1343,9 @@ to output with the new abundance informa with an abundance of 1: .PP .RS @@ -1178,41 +377,7 @@ .RE .PP Mask simple repeats and low complexity regions in the input fasta file -@@ -1316,26 +1316,26 @@ Mask simple repeats and low complexity r - file: - .PP - .RS --\fBvsearch\fR --maskfasta \fIqueries.fas\fR --output --\fIqueries_masked.fas\fR --qmask dust -+\fBvsearch\fR \-\-maskfasta \fIqueries.fas\fR \-\-output -+\fIqueries_masked.fas\fR \-\-qmask dust - .RE - .PP - Sort by decreasing abundance the sequences contained in queries.fas - (using the "size=\fIinteger\fR" information), relabel the sequences --while preserving the abundance information (with --sizeout), keep only -+while preserving the abundance information (with \-\-sizeout), keep only - sequences with an abundance equal to or greater than 2: - .PP - .RS --\fBvsearch\fR --sortbysize \fIqueries.fas\fR --output --\fIqueries_sorted.fas\fR --relabel sampleA_ --sizeout --minsize 2 -+\fBvsearch\fR \-\-sortbysize \fIqueries.fas\fR \-\-output -+\fIqueries_sorted.fas\fR \-\-relabel sampleA_ \-\-sizeout \-\-minsize 2 - .RE - .PP - Align all sequences in a database with each other and output all pairwise - alignments: - .PP - .RS --\fBvsearch\fR --allpairs_global \fIdatabase.fas\fR ----alnout \fIresults.aln\fR --acceptall -+\fBvsearch\fR \-\-allpairs_global \fIdatabase.fas\fR -+\-\-alnout \fIresults.aln\fR \-\-acceptall - .RE - .PP - Search queries in a reference database, with a 80%-similarity -@@ -1343,8 +1343,8 @@ threshold, take terminal gaps into accou +@@ -1362,8 +1362,8 @@ threshold, take terminal gaps into accou similarities: .PP .RS @@ -1223,7 +388,7 @@ .RE .PP Search a sequence dataset against itself (ignore self hits), get all -@@ -1352,9 +1352,9 @@ matches with at least 60% identity, and +@@ -1371,9 +1371,9 @@ matches with at least 60% identity, and blast-like tab-separated format: .PP .RS @@ -1236,7 +401,7 @@ .RE .PP Shuffle the input fasta file (change the order of sequences) in a -@@ -1362,8 +1362,8 @@ repeatable fashion (fixed seed), and wri +@@ -1381,8 +1381,8 @@ repeatable fashion (fixed seed), and wri to the output file: .PP .RS @@ -1246,8 +411,8 @@ +\fIqueries_shuffled.fas\fR \-\-seed 13 \-\-fasta_width 0 .RE .PP - .\" -@@ -1440,17 +1440,17 @@ Bug fixes (ssse3/sse41 requirement, memo + Sort by decreasing abundance the sequences contained in queries.fas +@@ -1469,17 +1469,17 @@ Bug fixes (ssse3/sse4.1 requirement, mem Bug fix (now writes help to stdout instead of stderr). .TP .BR v1.0.4\~ "released December 8th, 2014" @@ -1269,7 +434,7 @@ .TP .BR v1.0.8\~ "released January 22nd, 2015" Introduces several changes and bug fixes: -@@ -1459,7 +1459,7 @@ Introduces several changes and bug fixes +@@ -1488,7 +1488,7 @@ Introduces several changes and bug fixes a new linear memory aligner for alignment of sequences longer than 5,000 nucleotides, .IP - @@ -1278,23 +443,7 @@ abundance before clustering, .IP - meaning of userfields qlo, qhi, tlo, thi changed for compatibility -@@ -1468,12 +1468,12 @@ with usearch, - new userfields qilo, qihi, tilo, tihi gives coordinates ignoring - terminal gaps, - .IP - --in --uc output files, a perfect alignment is indicated with a "=" sign, -+in \-\-uc output files, a perfect alignment is indicated with a "=" sign, - .IP - ----cluster_fast will now sort sequences by decreasing length, then by -+\-\-cluster_fast will now sort sequences by decreasing length, then by - decreasing abundance and finally by sequence identifier, - .IP - --default --maxseqlength value set to 50,000 nucleotides, -+default \-\-maxseqlength value set to 50,000 nucleotides, - .IP - - fix for bug in alignment in rare cases, - .IP - -@@ -1481,7 +1481,7 @@ fix for lack of detection of under- or o +@@ -1511,7 +1511,7 @@ fix for lack of detection of under- or o .RE .TP .BR v1.0.9\~ "released January 22nd, 2015" _______________________________________________ debian-med-commit mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
