two thoughts 1) the ChromImpute calls come from individually ensemble-imputed tracks which together suggest a state N with probability p, such that p(N) = 1-p(!N) and in some cases p may be rather less than 1 for a given span of ~200bp. The uncertainty in state assignments is actually of interest just as it was with chromHMM, but storing it is also a bit messy because it's a much larger data structure than just seqname-start-end for the segment calls. It is however informative in terms of differences between (or within, cf. scATAC) cell types. This is something I probably should have developed further in chromophobe
2) at some point a lot of this question devolves into peak calling, i.e. what is the exemplar distribution for state N as a multivariate Bernoulli (say perhaps H3K27ac:1, H3K4me1:1, H3K4me3:0, H3K27me3:0, DNAm:0, DHS:1 for an active enhancer). The original and still reasonable motivation for using an HMM or factorial HMM to "discover" underlying states seems to have fallen by the wayside, for better or worse, such that storing the marginal probability that a given span is called "present" or "absent" for a mark might work fine The proposed use case is why I started working on chromophobe ( https://github.com/ttriche/chromophobe) but as time went by it seemed like I was the only one using it, and (worse) at that point I hadn't begun to automate documentation and test cases. The idea was to store a joint segmentation model along with its segment-wise uncertainties, something that probably benefits from a bigMatrix or other out-of-core backing store for the uncertainties (perhaps a big sparse Matrix would suffice). A use case that might revive the exercise would be importing all of the ChromImpute tracks and the associated transition/emission matrices, perhaps with the call uncertainties as a second milestone. These sorts of issues show up on a not-irregular basis disguised as other problems, so it may be worth doing, similar to the multi-assay approach for trying to impute missing assays. --t On Wed, Aug 12, 2015 at 2:01 PM, Vincent Carey <[email protected]> wrote: > It seems to me we may need a class to manage related annotation > structures. For example, the chromImpute segmentations of the genome > defined for various cell types. I would like to be able to take a region > of the genome (say a SNP) and ask how the state varies across cell types. > > AnnotationHub will provide access to cell-type specific GRanges but there > is no container that I can think of that would coordinate these as > analogous > to different "samples". > > Am I missing something? > > [[alternative HTML version deleted]] > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
