a few thoughts: 1) look into Shearwater ( https://bioconductor.org/packages/release/bioc/html/deepSNV.html), then
2) talk to Todd Druley @ WashU, Elli Pappaemanuil @ MSKCC, Ruud & Bob @ Erasmus, the usual suspects 3) plan to validate w/ddPCR (at the absolute very least) and be aware that most MRD in leukemia is done by a combination of BCR/TCR + breakpoint PCR (lymphoid/fusion-driven) or DFN flow (myeloid + normal cyto) not saying that ML-based methods might not help, but if you've got a 30x-100x genome (or even 1000x FM1) and are trying to compete with existing standard approaches that can detect molecules at 1e-6, it'll be rough. An alternative approach (that has been used repeatedly) is to throw caution to the wind, generate primers for numerous subject-specific somatic variants, and use the ensemble to try and model MRD (speaking of ML). On the one hand, that could give the clinic a "customer for life"; on the other hand, it's not conducive to large-scale automation & deployment. As far as I know, it never got much traction, in leukemia or anywhere else. (Consider that flow cytometry is capable of detecting 1-in-10K to 1-in-a-million cells in most clinical flow labs.) Best of luck! (and if you're not already working with UMI-tagged reads, please talk to the people in #2 above; the reason most people won't go below 5% VAF is that you get thwacked by error rates at that level, and the reason most NGS-based MRD is based on UMIs is that existing PCR-based methods have 6 logs sensitivity.) --t On Thu, Mar 5, 2020 at 4:08 PM Mulder, R <r.mulde...@umcg.nl> wrote: > Hi, > > > I was wondering if anyone could help me with a script and support for the > above mentioned goal. > > For this I have several BAM files for which I want to determine de > nucleotide count per region of interest. The latter could be several > hotspot mutation sites. I would like to get an overall overview of all the > BAM files. Next I want to use these counts to determine for any new BAM > file if the count for a particular genomic position is higher than the > allowable range, hence could indicate if a mutation is present. For this I > would like to use the modified Thompson Tau test. I think machine learning > could be used for this. So, why do I want to do all this? Well, normal NGS > pipelines only deal with variants at a frequency of 5%. Mutatios below this > frequency are often missed. To know if a mutation is present below this > level, you showed dive into the alignment and most often manually > investigate the base calls. I know that this races some questions regarding > call qualities, but then again our conventional assays have actually > confirmed some of these low mutations. In addition, NGS can > be used to determine the presence of low frequent mutation which is of > great importance for determining the measurable residual disease after > treatment. > > > I am new to r and bioconductor so I would be very thankful if someone > could help me in my mission to setting up a script for this purpose. > > > Thanks, > > > Rene Mulder > > Laboratory Medicine > > University Medical Center Groningen > > The Netherlands > > ________________________________ > De inhoud van dit bericht is vertrouwelijk en alleen bes...{{dropped:15}} > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel