On Mar 29, 2011, at 4:01 AM, Ketil Malde wrote:

> 
> I was thinking that we might want to keep e.g. Blast results' original
> offsets (which I believe are 1-based), so that you don't need to convert
> in order to do non-transforming operations (e.g. select a subset of
> results).  Using a different type would be good, since it would catch
> inadvertent mixing of conceptiually different values.

That sounds reasonable to me, though the time taken by one extra succ or pred 
is pretty small relative to the overall time required to convert between string 
and machine representation of numbers. However, that's a decision that's really 
independent of seqloc, and I think it makes sense to keep a single 0-based 
interface for seqloc. A Blast alignment library might have a data type with a 
qstart and a qend field containing 1-based Int64 indices and provide an 
accessor that handled any coordinate conversion needed to produce a seqloc 
location. That's what I do for GTF annotations (and BED annotations, whose 
coordinate scheme that wouldn't be captured perfectly by either 0-based or 
1-based indexing).

> Thus my call for a general Alignment class or data type, converting
> to (or accessing through) Alignment should convert to standard choices,
> and make things comparable - also alignment from different tools.

There is a lot of heterogeneity in alignments produced by different tools--how 
they handle gaps, whether they're designed for mRNA-to-genome alignments, local 
versus global alignment, whether they score similarity in protein sequence, &c. 
I don't expect to find myself abstracting over different kinds of Alignments in 
tools or libraries that I write, though I'm happy to provide support for an 
Alignment typeclass in the seqloc library and in my samtools wrapper.

For the moment, my inclination is to update the seqloc library to use Int64 
indices and leave it otherwise unchanged, and to release the GTF & BED library 
at the same time.

Best,
--Nick


_______________________________________________
Biohaskell mailing list
Biohaskell@biohaskell.org
http://malde.org/cgi-bin/mailman/listinfo/biohaskell

Reply via email to