This is an automated email from the git hooks/post-receive script. tille pushed a commit to branch master in repository vcftools.
commit de5be31edee1f02434b3d80d404ac5183eacbe16 Author: Andreas Tille <[email protected]> Date: Sun Jul 3 22:09:59 2016 +0200 Use official manpage --- debian/manpages | 1 + debian/mans/vcftools.1 | 472 ------------------------------------------------- 2 files changed, 1 insertion(+), 472 deletions(-) diff --git a/debian/manpages b/debian/manpages index 4f4649b..5c1104c 100644 --- a/debian/manpages +++ b/debian/manpages @@ -1 +1,2 @@ debian/mans/*.1 +src/cpp/*.1 diff --git a/debian/mans/vcftools.1 b/debian/mans/vcftools.1 deleted file mode 100644 index 84bcae8..0000000 --- a/debian/mans/vcftools.1 +++ /dev/null @@ -1,472 +0,0 @@ -.TH VCFTOOLS "1" "July 2011" "vcftools 0.1.5" "User Commands" -.SH NAME -vcftools \- analyse VCF files -.SH SYNOPSIS -.B vcftools \fR[\fIOPTIONS\fR] -.SH DESCRIPTION -The vcftools program is run from the command line. The interface is -inspired by PLINK, and so should be largely familiar to users of that -package. Commands take the following form: - - vcftools \-\-vcf file1.vcf \-\-chr 20 \-\-freq - -The above command tells vcftools to read in the file file1.vcf, extract -sites on chromosome 20, and calculate the allele frequency at each site. -The resulting allele frequency estimates are stored in the output file, -out.freq. As in the above example, output from vcftools is mainly sent to -output files, as opposed to being shown on the screen. - -Note that some commands may only be available in the latest version of -vcftools. To obtain the latest version, you should use SVN to checkout the -latest code, as described on the home page. - -Also note that polyploid genotypes are not currently supported. - -.SS Basic Options -.TP -\fB\-\-vcf\fR <filename> -This option defines the VCF file to be processed. The files need to be -decompressed prior to use with vcftools. vcftools expects files in VCF -format v4.0, a specification of which can be found here. -.TP -\fB\-\-gzvcf\fR <filename> -This option can be used in place of the \-\-vcf option to read compressed -(gzipped) VCF files directly. Note that this option can be quite slow when -used with large files. -.TP -\fB\-\-out\fR <prefix> -This option defines the output filename prefix for all files generated by -vcftools. For example, if <prefix> is set to output_filename, then all -output files will be of the form output_filename.*** . If this option is -omitted, all output files will have the prefix 'out.'. - -.SS Site Filter Options - -.TP -\fB\-\-chr\fR <chromosom> -Only process sites with a chromosome identifier matching <chromosome> -.TP -\fB\-\-from\-bp\fR <integer> -.TP -\fB\-\-to\-bp\fR <integer> -These options define the physical range of sites will be processed. Sites -outside of this range will be excluded. These options can only be used in -conjunction with \-\-chr. -.TP -\fB\-\-snp\fR <string> -Include SNP(s) with matching ID. This command can be used multiple times -in order to include more than one SNP. -.TP -\fB\-\-snps\fR <filename> -Include a list of SNPs given in a file. The file should contain a list of -SNP IDs, with one ID per line. -.TP -\fB\-\-exclude\fR <filename> -Exclude a list of SNPs given in a file. The file should contain a list of -SNP IDs, with one ID per line. -.TP -\fB\-\-positions\fR <filename> -Include a set of sites on the basis of a list of positions. Each line of -the input file should contain a (tab-separated) chromosome and position. -The file should have a header line. Sites not included in the list are -excluded. -.TP -\fB\-\-bed\fR <filename> -.TP -\fB\-\-exclude\-bed\fR <filename> -Include or exclude a set of sites on the basis of a BED file. Only the -first three columns (chrom, chromStart and chromEnd) are required. The -BED file should have a header line. -.TP -\fB\-\-remove\-filtered\-all\fR -.TP -\fB\-\-remove\-filtered\fR <sting> -.TP -\fB\-\-keep\-filtered\fR <sting> -These options are used to filter sites on the basis of their FILTER flag. -The first option removes all sites with a FILTER flag. The second option -can be used to exclude sites with a specific filter flag. The third option -can be used to select sites on the basis of specific filter flags. -The second and third options can be used multiple times to specify multiple -FILTERs. The \-\-keep\-filtered option is applied before -the \-\-remove\-filtered -option. -.TP -\fB\-\-minQ\fR <float> -Include only sites with Quality above this threshold. -.TP -\fB\-\-min\-meanDP\fR <float> -.TP -\fB\-\-max\-meanDP\fR <float> -Include sites with mean Depth within the thresholds defined by these options. -.TP -\fB\-\-maf\fR <float> -.TP -\fB\-\-max\-maf\fR <float> -Include only sites with Minor Allele Frequency within the specified range. -.TP -\fB\-\-non\-ref\-af\fR <float> -.TP -\fB\-\-max\-non\-ref\-af\fR <float> -Include only sites with Non-Reference Allele Frequency within the specified -range. -.TP -\fB\-\-hue\fR <float> -Assesses sites for Hardy-Weinberg Equilibrium using an exact test, as -defined by Wigginton, Cutler and Abecasis (2005). Sites with a p-value -below the threshold defined by this option are taken to be out of HWE, -and therefore excluded. -.TP -\fB\-\-geno\fR <float> -Exclude sites on the basis of the proportion of missing data (defined to -be between 0 and 1). -.TP -\fB\-\-min\-alleles\fR <int> -.TP -\fB\-\-max\-alleles\fR <int> -Include only sites with a number of alleles within the specified range. -For example, to include only bi\-allelic sites, one could use: - - vcftools \-\-vcf file1.vcf \-\-min\-alleles 2 \-\-max\-alleles 2 - -.TP -\fB\-\-mask\fR <filename> -.TP -\fB\-\-invert\-mask\fR <filename> -.TP -\fB\-\-mask\-min\fR <filename> -Include sites on the basis of a FASTA-like file. The provided file contains -a sequence of integer digits (between 0 and 9) for each position on a -chromosome that specify if a site at that position should be filtered or not. -An example mask file would look like: - - >1 - 0000011111222... - -In this example, sites in the VCF file located within the first 5 bases of -the start of chromosome 1 would be kept, whereas sites at position 6 onwards -would be filtered out. The threshold integer that determines if sites are -filtered or not is set using the \-\-mask\-min option, which defaults to 0. -The chromosomes contained in the mask file must be sorted in the same order -as the VCF file. The \-\-mask option is used to specify the mask file to be -used, whereas the \-\-invert\-mask option can be used to specify a mask file -that will be inverted before being applied. - -.SS Individual Filters - -.TP -\fB\-\-indv\fR <string> -Specify an individual to be kept in the analysis. This option can be used -multiple times to specify multiple individuals. -.TP -\fB\-\-keep\fR <filename> -Provide a file containing a list of individuals to include in subsequent a -nalysis. Each individual ID (as defined in the VCF headerline) should be -included on a separate line. -.TP -\fB\-\-remove\-indv\fR <string> -Specify an individual to be removed from the analysis. This option can be -used multiple times to specify multiple individuals. If the \-\-indv option -is also specified, then the \-\-indv option is executed before -the \-\-remove\-indv option. -.TP -\fB\-\-remove\fR <filename> -Provide a file containing a list of individuals to exclude in subsequent -analysis. Each individual ID (as defined in the VCF headerline) should be -included on a separate line. If both the \-\-keep and the \-\-remove options -are used, then the \-\-keep option is execute before the \-\-remove option. -.TP -\fB\-\-mon\-indv\-meanDP\fR <float> -.TP -\fB\-\-max\-indv\-meanDP\fR <float> -Calculate the mean coverage on a per-individual basis. Only individuals with -coverage within the range specified by these options are included in -subsequent analyses. -.TP -\fB\-\-mind\fR <float> -Specify the minimum call rate threshold for each individual. -.TP -\fB\-\-phased\fR -First excludes all individuals having all genotypes unphased, and -subsequently excludes all sites with unphased genotypes. The remaining data -therefore consists of phased data only. - -.SS Genotype Filters -.TP -\fB\-\-remove\-filtered\-geno\-all\fR -.TP -\fB\-\-remove\-filtered\-geno\fR <string> -The first option removes all genotypes with a FILTER flag. The second option -can be used to exclude genotypes with a specific filter flag. -.TP -\fB\-\-minGQ\fR <float> -Exclude all genotypes with a quality below the threshold specified by -this option (GQ). -.TP -\fB\-\-minDP\fR <float> -Exclude all genotypes with a sequencing depth below that specified by -this option (DP) - -.SS Output Statistics -.TP -\fB\-\-freq\fR -.TP -\fB\-\-counts\fR -.TP -\fB\-\-freq2\fR -.TP -\fB\-\-counts2\fR -Output per\-site frequency information. The \-\-freq outputs the allele -frequency in a file with the suffix '.frq'. The \-\-counts option outputs a -similar file with the suffix '.frq.count', that contains the raw allele -counts at each site. -The \-\-freq2 and \-\-count2 options are used to suppress allele information in -the output file. In this case, the order of the freqs/counts depends on the -numbering in the VCF file. -.TP -\fB\-\-depth\fR -Generates a file containing the mean depth per individual. This file has -the suffix '.idepth'. -.TP -\fB\-\-site\-depth\fR -.TP -\fB\-\-site\-mean\-depth\fR -Generates a file containing the depth per site. The \-\-site\-depth option -outputs the depth for each site summed across individuals. This file has -the suffix '.ldepth'. Likewise, the \-\-site\-mean\-depth outputs the mean -depth for each site, and the output file has the suffix '.ldepth.mean'. -.TP -\fB\-\-geno\-depth\fR -Generates a (possibly very large) file containing the depth for each -genotype in the VCF file. Missing entries are given the value \-1. The -file has the suffix '.gdepth'. -.TP -\fB\-\-site\-quality\fR -Generates a file containing the per\-site SNP quality, as found in the QUAL -column of the VCF file. This file has the suffix '.lqual'. -.TP -\fB\-\-het\fR -Calculates a measure of heterozygosity on a per\-individual basis. -Specfically, the inbreeding coefficient, F, is estimated for each -individual using a method of moments. The resulting file has the suffix '.het'. -.TP -\fB\-\-hardy\fR -Reports a p\-value for each site from a Hardy\-Weinberg Equilibrium test -(as defined by Wigginton, Cutler and Abecasis (2005)). The resulting file -(with suffix '.hwe') also contains the Observed numbers of Homozygotes and -Heterozygotes and the corresponding Expected numbers under HWE. -.TP -\fB\-\-missing\fR -Generates two files reporting the missingness on a per\-individual and -per\-site basis. The two files have suffixes '.imiss' and '.lmiss' -respectively. -.TP -\fB\-\-hap\-r2\fR -.TP -\fB\-\-geno\-r2\fR -.TP -\fB\-\-ld\-window\fR <int> -.TP -\fB\-\-ld\-window\-bp\fR <int> -.TP -\fB\-\-min\-r2\fR <float> -These options are used to report Linkage Disequilibrium (LD) statistics -as summarised by the r2 statistic. The \-\-hap\-r2 option informs vcftools -to output a file reporting the r2 statistic using phased haplotypes. This -is the traditional measure of LD often reported in the population genetics -literature. If phased haplotypes are unavailable then the \-\-geno\-r2 option -may be used, which calculates the squared correlation coefficient between -genotypes encoded as 0, 1 and 2 to represent the number of non-reference -alleles in each individual. This is the same as the LD measure reported -by PLINK. The haplotype version outputs a file with the suffix '.hap.ld', -whereas the genotype version outputs a file with the suffix '.geno.ld'. -The haplotype version implies the option \-\-phased. - -The \-\-ld\-window option defines the maximum SNP separation for the -calculation of LD. Likewise, the \-\-ld\-window\-bp option can be used to -define the maximum physical separation of SNPs included in the LD -calculation. Finally, the \-\-min\-r2 sets a minimum value for r2 below -which the LD statistic is not reported. -.TP -\fB\-\-SNPdnsity\fR <int> -Calculates the number and density of SNPs in bins of size defined by this -option. The resulting output file has the suffix '.snpden'. -.TP -\fB\-\-TsTv\fR <int> -Calculates the Transition / Transversion ratio in bins of size defined by -this option. The resulting output file has the suffix '.TsTv'. A summary -is also supplied in a file with the suffix '.TsTv.summary'. -.TP -\fB\-\-FILTER\-summary\fR -Generates a summary of the number of SNPs and Ts/Tv ratio for each FILTER -category. The output file has the suffix '.FILTER.summary. -.TP -\fB\-\-filtered\-sites\fR -Creates two files listing sites that have been kept or removed after -filtering. The first file, with suffix '.kept.sites', lists sites kept -by vcftools after filters have been applied. The second file, with the -suffix '.removed.sites', list sites removed by the applied filters. -.TP -\fB\-\-singletons\fR -This option will generate a file detailing the location of singletons, and -the individual they occur in. The file reports both true singletons, and -private doubletons (i.e. SNPs where the minor allele only occurs in a -single individual and that individual is homozygotic for that allele). -The output file has the suffix '.singletons'. -.TP -\fB\-\-site\-pi\fR -.TP -\fB\-\-window\-pi\fR <int> -These options are used to estimate levels of nucleotide diversity. The first -option does this on a per\-site basis, and the output file has the -suffix '.sites.pi'. The second option calculates the nucleotide diversity in -windows, with the window size defined in the option argument. Output for -this option has the suffix '.windowed.pi'. The windowed version requires -phased data, and hence use of this option implies the \-\-phased option. - -.SS Output in Other Formats -.TP -\fB\-\-O12\fR -This option outputs the genotypes as a large matrix. Three files are -produced. The first, with suffix '.012', contains the genotypes of each -individual on a separate line. Genotypes are represented as 0, 1 and 2, -where the number represent that number of non-reference alleles. Missing -genotypes are represented by \-1. The second file, with suffix '.012.indv' -details the individuals included in the main file. The third file, with -suffix '.012.pos' details the site locations included in the main file. -.TP -\fB\-\-IMPUTE\fR -This option outputs phased haplotypes in IMPUTE reference\-panel format. As -IMPUTE requires phased data, using this option also implies \-\-phased. -Unphased individuals and genotypes are therefore excluded. Only bi\-allelic -sites are included in the output. Using this option generates three files. -The IMPUTE haplotype file has the suffix '.impute.hap', and the IMPUTE -legend file has the suffix '.impute.hap.legend'. The third file, with -suffix '.impute.hap.indv', details the individuals included in the -haplotype file, although this file is not needed by IMPUTE. -.TP -\fB\-\-ldhat\fR -.TP -\fB\-\-ldhat\-geno\fR -These options output data in LDhat format. Use of these options also -require the \-\-chr option to by used. The \-\-ldhat option outputs phased -data only, and therefore also implies \-\-phased, leading to unphased -individuals and genotypes being excluded. Alternatively, the \-\-ldhat\-geno -option treats all of the data as unphased, and therefore outputs LDhat -files in genotype/unphased format. In either case, two files are generated -with the suffixes '.ldhat.sites' and '.ldhat.locs', which correspond to the -LDhat 'sites' and 'locs' input files respectively. -.TP -\fB\-\-BEAGLE\-GL\fR -This option outputs genotype likelihood information for input into the -BEAGLE program. This option requires the VCF file to contain the FORMAT -GL tag, which can generally be output by SNP callers such as the GATK. -Use of this option requires a chromosome to be specified via the -\-\-chr option. The resulting output file (with the suffix '.BEAGLE.GL') -contains genotype likelihoods for biallelic sites, and is suitable for -input into BEAGLE via the 'like=' argument. -.TP -\fB\-\-plink\fR -This option outputs the genotype data in PLINK PED format. Two files are -generated, with suffixes '.ped' and '.map'. Note that only bi\-allelic loci -will be output. Further details of these files can be found in the PLINK -documentation. - -Note: This option can be very slow on large datasets. Using the \-\-chr option -to divide up the dataset is advised. -.TP -\fB\-\-plink\-tped\fR -The \-\-plink option above can be extremely slow on large datasets. An -alternative that might be considerably quicker is to output in the -PLINK transposed format. This can be achieved using the \-\-plink\-tped -option, which produces two files with suffixes '.tped' and '.tfam'. -.TP -\fB\-\-recode\fR -The \-\-recode option is used to generate a VCF file from the input VCF file -having applied the options specified by the user. The output file has the -suffix '.recode.vcf'. - -By default, the INFO fields are removed from the output file, as the INFO -values may be invalidated by the recoding (e.g. the total depth may need to -be recalculated if individuals are removed). This default functionality can -be overridden by using the \-\-keep\-INFO <string> option, where <string> -defines the INFO key to keep in the output file. The \-\-keep\-INFO flag can -be used multiple times. Alternatively, the option \-\-keep\-INFO-all can be -used to retain all INFO fields. - -.SS Miscellaneous -.TP -\fB\-\-extract\-FORMAT\-info\fR <string> -Extract information from the genotype fields in the VCF file relating to a -specified FORMAT identifier. For example, using the -option '\-\-extract\-FORMAT\-info GT' would extract the all of the GT -(i.e. Genotype) -entries. The resulting output file has the suffix '.<FORMAT_ID>.FORMAT'. -.TP -\fB\-\-get\-INFO\fR <string> -This option is used to extract information from the INFO field in the VCF -file. The <string> argument specifies the INFO tag to be extracted, and the -option can be used multiple times in order to extract multiple INFO entries. -The resulting file, with suffix '.INFO', contains the required INFO -information in a tab\-separated table. For example, to extract the NS and -DB flags, one would use the command: - - vcftools \-\-vcf file1.vcf \-\-get\-INFO NS \-\-get\-INFO DB - -.SS VCF File Comparison Options - -The file comparison options are currently in a state of flux and likely buggy. -If you find a bug, please report it. Note that genotype\-level filters are not -supported in these options. - -.TP -\fB\-\-diff\fR <filename> -.TP -\fB\-\-gzdiff\fR <filename> -Select a VCF file for comparison with the file specified by the \-\-vcf option. -Outputs two files describing the sites and individuals common / unique to -each file. These files have the suffixes '.diff.sites_in_files' -and '.diff.indv_in_files' respectively. The \-\-gzdiff version can be used to -read compressed VCF files. -.TP -\fB\-\-diff\-site\-discordance\fR -Used in conjunction with the \-\-diff option to calculate discordance on a -site by site basis. The resulting output file has the suffix '.diff.sites'. -.TP -\fB\-\-diff\-indv\-discordance\fR -Used in conjunction with the \-\-diff option to calculate discordance on a -per-individual basis. The resulting output file has the suffix '.diff.indv'. -.TP -\fB\-\-diff\-discordance\-matrix\fR -Used in conjunction with the \-\-diff option to calculate a discordance matrix. -This option only works with bi\-allelic loci with matching alleles that are -present in both files. The resulting output file has the -suffix '.diff.discordance.matrix'. -.TP -\fB\-\-diff\-switch\-error\fR -Used in conjunction with the \-\-diff option to calculate phasing errors -(specifically 'switch errors'). This option generates two output files -describing switch errors found between sites, and the average switch error -per individual. These two files have the suffixes '.diff.switch' -and '.diff.indv.switch' respectively. - -.SS Options still in development - -The following options are yet to be finalised, are likely to contain bugs, -and are likely to change in the future. -.TP -\fB\-\-fst\fR <filename> -.TP -\fB\-\-gzfst\fR <filename> -Calculate FST for a pair of VCF files, with the second file being specified -by this option. FST is currently calculated using the formula described in -the supplementary material of the Phase I HapMap paper. Currently, only -pairwise FST calculations are supported, although this will likely change -in the future. The \-\-gzfst option can be used to read compressed VCF files. - -.TP -\fB\-\-LROH\fR -Identify Long Runs of Homozygosity. -.TP -\fB\-\-relatedness\fR -Output Individual Relatedness Statistics. -- Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/vcftools.git _______________________________________________ debian-med-commit mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
