Hello, I checked ape::del.colgapsonly, ips::deleteGaps and ips::deleteEmptyCells. They delete columns containing missing values, but I need also to delete columns containing base "N" (all columns with amount of Ns over certain threshold). Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so it is suppose to remove columns/rows containing only the given characters, but if I use it and export data (ape::write.dna or ape::write.nexus.data), some samples consist only of N characters... The DNAbin object being processed was originally imported from VCF using vcfR (read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf, consensus=TRUE, extract.haps=FALSE, unphased_as_NA=FALSE)). I checked source code of the above functions, but they seem to only count NAs and then drop respective columns. And as sequences in DNAbin are stored in binary format, I'm bit struggled here... :( Any idea how to remove columns with given portion of "N" in sequences? Sincerely, V.
-- Vojtěch Zeisek https://trapa.cz/en/ Department of Botany, Faculty of Science Charles University, Prague, Czech Republic https://www.natur.cuni.cz/biology/botany/ Institute of Botany, Czech Academy of Sciences Průhonice, Czech Republic http://www.ibot.cas.cz/en/
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/