Hi,
note: apologies in advance if this gets duplicated. It didn't post
after a day, and I figured it may have been blocked due to my pgp sig
attachment.
First, i just wanted to say thanks for the mailing list and to thank
everyone for their work on the source tree - it's a great resource
that I use almost daily! I've browsed the list for quite some time,
but have recently run across some strangeness in the behavior of
gfClient relative to blat. Likely, the strangeness is of my own
doing, but I figured I might email to see if that, indeed, was the case.
I'm working from gfClient/Server (v.34x4) and blat (v. 34x4) compiled
from CVS. The problem I'm running into deals with alignments starting
in repeat regions (versus alignments extending over repeats). Here
are my gfServer start parameters:
/Users/bcf/bin/i386/gfserver start 127.0.0.1 8888 /Users/bcf/Data/test/
SoftMask/*.softmask.2bit -mask
where *.softmask.2bit was created from a fasta file of soft-masked
sequences (from repeatmasker | `maskOutFa -soft`) using faToTwobit.
these targets also contain the query sequence I am demonstrating
with. I am running gfServer because the number of queries for what I
am attempting is large, and I would prefer to avoid reindexing the
2bit file with every call to blat.
my query with gfClient is:
/Users/bcf/bin/i386/gfclient -t=DNA -q=DNA -minScore=0 -minIdentity=0 -
out=psl 127.0.0.1 8888 / ~/tmp/tmp.fa stdout
where tmp.fa is a single, soft-masked sequence in fasta format.
tmp.fa has a soft-masked repeat region, extending from position 76-158
(0-indexed). The (truncated) gfClient output is:
match mis- rep. N's Q gap Q gap T gap T gap strand Q
Q
Q Q T T T T block blockSizes
qStarts tStarts
match match count bases count bases name
size
start end name size start end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
100 2 0 1 0 0 2 2 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02D3DFI 250 30
135 3 18,4,81, 0,18,22, 30,49,54,
99 2 0 1 1 1 3 5 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DA9YF 222 102
209 4 18,4,68,12, 0,18,22,91, 102,121,126,197,
94 2 0 0 1 7 1 9 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DAKZ3 297 23 128
2 12,84, 0,19, 23,44,
100 2 0 1 0 0 4 12 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DBSW8 222 102
217 5 18,4,17,51,13, 0,18,22,39,90, 102,121,126,144,204,
55 1 0 0 0 0 0 0 -
FX5ZTWB02D5UGZ 179 76 132 FX5ZTWB02DBYD5 226 39 95
1 56, 47, 39,
67 1 0 0 0 0 0 0 -
FX5ZTWB02D5UGZ 179 76 144 FX5ZTWB02DJ4YU 231 96 164
1 68, 35, 96,
100 2 0 1 0 0 2 2 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DJB25 170 29
134 3 18,4,81, 0,18,22, 29,48,53,
100 2 0 1 0 0 2 2 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DVMEF 168 29
134 3 18,4,81, 0,18,22, 29,48,53,
79 0 0 0 1 3 3 19 -
FX5ZTWB02D5UGZ 179 76 158 FX5ZTWB02DWVVC 241 64
162 4 13,28,15,23, 21,34,65,80, 64,94,123,139,
94 2 0 0 1 7 1 9 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02EGVMB 247 23 128
2 12,84, 0,19, 23,44,
100 2 0 1 0 0 2 2 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02EHBES 338 39
144 3 18,4,81, 0,18,22, 39,58,63,
44 0 0 0 0 0 1 1 -
FX5ZTWB02D5UGZ 179 76 120 FX5ZTWB02EOC38 213 66 111
2 28,16, 59,87, 66,95,
100 2 0 1 0 0 2 2 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02ETWES 202 29
134 3 18,4,81, 0,18,22, 29,48,53,
100 2 0 1 0 0 1 1 -
FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02EZOW2 208 19
123 2 18,85, 0,18, 19,38,
A blat run of the form:
blat /Users/bcf/Data/test/SoftMask/*.softmask.clean.2bit tmp.fa -
mask=lower stdout
returns (full output):
match mis- rep. N's Q gap Q gap T gap T gap strand Q
Q
Q Q T T T T block blockSizes
qStarts tStarts
match match count bases count bases name
size
start end name size start end count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
31 2 80 0 0 0 1 24 +
FX5ZTWB02D5UGZ 179 45 158 FX5ZTWB02EZZ23 182 35
172 2 31,82, 45,76, 35,90,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EMMWO 294 35 67
1
32, 45, 35,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EM5LP 153 35 67
1
32, 45, 35,
32 0 33 0 1 1 1 23 +
FX5ZTWB02D5UGZ 179 45 111 FX5ZTWB02ELBHJ 161 34
122 2 32,33, 45,78, 34,89,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EKORL 159 36 68
1
32, 45, 36,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EJB29 138 35 67
1
32, 45, 35,
66 2 0 0 1 8 2 26 +
FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02EJ0PM 301 0 94
3
11,26,31, 0,11,45, 0,12,63,
68 1 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 0 77 FX5ZTWB02EICDX 381 229 322
2 37,32, 0,45, 229,290,
66 2 0 0 1 8 2 25 +
FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02EH3VT 247 0 93
3
11,26,31, 0,11,45, 0,12,62,
62 1 0 0 2 13 2 35 +
FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02EGNY4 328 0 98
3
24,8,31, 0,29,45, 0,32,67,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02EG2T5 198 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02ECX8Z 224 0 67
2
11,32, 26,45, 0,35,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EC10O 167 35 67
1
32, 45, 35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02EBHWR 212 0 67
2
11,32, 26,45, 0,35,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DYAKJ 141 35 67
1
32, 45, 35,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DUG6S 181 35 67
1
32, 45, 35,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DTP6B 182 35 67
1
32, 45, 35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DSULL 245 0 67
2
11,32, 26,45, 0,35,
44 0 37 0 1 8 2 55 +
FX5ZTWB02D5UGZ 179 26 115 FX5ZTWB02DPKMK 206 0
136 3 11,32,38, 26,45,77, 0,35,98,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DPI46 179 35 67
1
32, 45, 35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DNZB3 290 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DMW8R 211 0 67
2
11,32, 26,45, 0,35,
40 0 0 0 1 8 2 27 +
FX5ZTWB02D5UGZ 179 26 74 FX5ZTWB02DMQ4E 240 0 67
3
11,5,24, 26,45,50, 0,37,43,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DKKJB 175 35 67
1
32, 45, 35,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DI8TE 158 35 67
1
32, 45, 35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DGCU6 275 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DB9V0 286 0 67
2
11,32, 26,45, 0,35,
43 2 78 0 2 9 2 47 +
FX5ZTWB02D5UGZ 179 26 158 FX5ZTWB02D92EW 204 0
170 3 11,32,80, 26,45,78, 0,35,90,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D8YAV 238 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D8RCP 216 0 67
2
11,32, 26,45, 0,35,
95 0 82 1 1 1 2 4 +
FX5ZTWB02D5UGZ 179 0 179 FX5ZTWB02D887O 221 0 182
3 11,149,18, 0,11,161, 0,12,164,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D83KC 250 0 67
2
11,32, 26,45, 0,35,
96 0 82 1 0 0 0 0 +
FX5ZTWB02D5UGZ 179 0 179 FX5ZTWB02D5UGZ 179 0 179
1 179, 0, 0,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D5NGS 270 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D35EE 194 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D1GE9 269 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D1EQS 198 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D0168 201 0 67
2
11,32, 26,45, 0,35,
30 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 47 77 FX5ZTWB02C9X15 154 39 69
1
30, 47, 39,
32 0 0 0 0 0 0 0 +
FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02C8WKA 194 35 67
1
32, 45, 35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02C7D4Y 222 0 67
2
11,32, 26,45, 0,35,
43 0 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02C6UVQ 263 0 67
2
11,32, 26,45, 0,35,
66 2 0 0 1 8 2 25 +
FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02C5OY2 249 0 93
3
11,26,31, 0,11,45, 0,12,62,
66 2 0 0 1 8 1 24 +
FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02C1LC0 179 0 92
2
37,31, 0,45, 0,61,
It looks like blat is treating the masking correctly in alignments -
there are no alignments starting in the repeat region (76-158) of the
Query or the Targets. Alignments across masked regions that begin in
unmasked regions are treated as expected (i,e. the self to self (Q=
FX5ZTWB02D5UGZ to T= FX5ZTWB02D5UGZ) alignment extends through the
masked region).
Conversely, in the truncated gfClient output, several of the
alignments listed have a `Q start` to `Q end` within 76-158, which is
an unexpected result given the use of the `-mask` flag to start an
instance of gfServer with the soft-masked, 2bit input file. After
double-checking the associated Target sequence (and reverse complement
of the Target) for masked bases, it appears alignments are started in
repeat-masked regions of these targets.
I noticed the gfServer help indicated that the mask option is to be
used with nib files, but I assumed since 2bit files were also a valid
input option (and can be composed of multiple fastas, which I need),
the `-mask` option applied, as well. So, the discrepancy in the
output from blat versus gfClient is what has me confused. Again, I
suspect that I've got something wrong here, that my interpretation of
the expected behavior is incorrect, or that the help is indeed correct
that gfServer masking is nib only, but I can't quite put my finger on
the problem.
Thanks for your time,
brant
************************************************
Brant C. Faircloth
Dept. of Ecology and Evolutionary Biology
621 Charles E. Young Drive South
University of California
Los Angeles, CA 90095 USA
rooms: LSS 4304 and 4315
email: [email protected]
lab: +1.310.206.2270
office: +1.310.206.3083
mobile: +1.706.201.6110
************************************************
< * )
(_ \\
_ ||
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome