Hi,
I am new to bioconductor and working with unannotated viral genomes and
using some gene prediction algorithms to find
gene-like regions in the genomes.

In ShortRead, I am interesting in using the "Filtering input" modality
(below) much like chromosome filtering but for gene filtering (all reads to
specific gene region).
Below, there is cfilt <- chromosomeFilter("chr5.fa").
With the viral gene regions in individual fasta files, would if work to
define cfilt <- chromosomeFilter ("Path/to/viralgene1.fa")
and then viralgene1 <-readAligned(sp, "alignment_data.txt" filter=cfilt)?

I tried to read about sfFilter, to customize a filter, but I don't see how
you can add a path to fasta file to serve as the filter reference.

Thanks for any advice on how to create custom filter with fasta files as the
reference for that filter.
*
1.2.3 Filtering input
Downstream analysis may often want to use a well-defined subset of reads.
These
can be selected with the filter argument of readAligned. There are built-in
filters, for instance to remove all reads containing an N nucleotide, to
select just
those reads that map to the genome file chr5.fa, to select reads on the +
strand,
or to ‘level the playing field’ by selecting only a single read for any
chromosome,
position and strand:
> nfilt <- nFilter()
> cfilt <- chromosomeFilter("chr5.fa")
> sfilt <- strandFilter("+")
> ofilt <- occurrenceFilter(withSread = FALSE)
Here we select only those reads that map to chr5.fa:
> chr5 <- readAligned(sp, "s_2_export.txt", filter = cfilt)


# custom filter: minimum calibrated base call quality >20
goodq <- srFilter(function(x) {
    apply(as(quality(x), "matrix"), 1, min) > 20
}, name="GoodQualityBases")
goodq
aln[goodq(aln)]*

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to