Hi, I want to share a few issues on the performance of tools for working
with MAF alignments in the "Fetch Alignments" section in Galaxy.
*The first issue* I have is with the "Stitch MAF blocks given a set of
genomic intervals" tool. For a given set of genomic intervals I use this
to obtain, in my case, the human and the corresponding mouse sequences from
17-way MULTIZ alignment. The problem is, that in the case of
insertions in the mouse sequence, the human sequence, which is a reference
(the intervals are for human genome) does not contain any
gaps. The gaps are only present in mouse sequence. So when there is any
insertion in mouse, there are no gaps in the human sequence - simply
this portion of alignment is not fetched.
*The second issue*. It is somehow related to the first issue. To overcome
the problem described above I used the "Extract MAF blocks given a set of
genomic intervals" tool.
Here I get a set of MAF block that overlap my intervals, and because one set
of intervals can overlap with more than one block, the number of resulting
block is of course
higher than the initial number of intervals. Now I can use the "Join MAF
blocks by Species" tool to join some of this smaller block to get the full
overlap of my intervals.
However, this tool is sensitive on the order of MAF blocks, meaning the
blocks that look like the ones below will be joined to one block with 28nt
in length, but if we reverse their
order, they won't be joined together.
s hg18.chr10 101141372 14 + 135374737 CTGCCTTCCCTTCC
s mm8.chr19 43548455 10 + 61321190 CGGCCCTTCA----
s hg18.chr10 101141386 14 + 135374737 ATCTCTTCACCCCT
s mm8.chr19 43548465 12 + 61321190 --CCCTTCACCCCT
This is also specific to the strand, so for the '+' and the '-' strand the
order of blocks has to be different, meaning, ascending for the '+', and
descending for the '-' strand.
*The third issue* is with the "Reverse Complement a MAF file" tool. The
reverse complement sequence that I get is actually good, there's nothing
wrong with it,
but the problem is, that the coordinates change, meaning, tool starts
counting from the end position of a chromosome, not from the start position.
So now if I want to relate my genomic intervals with the resulting MAF block
which I reverse-complemented no I can't do this, because coordinates changed
the second one. So to do that I need to take the length of chromosome and
subtract the start position from it, then subtract length and only then I
get the same
coordinate as in my set of intervals. I think it would be better if the tool
would keep the original coordinates.
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: