Hi, Brant!
Looking at the source code for gfServer,
it seems to ignore the -mask option for untranslated use.
Even for translated, it's probably just using
it as a way to keep the speed up -- it's still
not providing the kind of masking control
that standalone blat provides.
It does appear that you will need to use standalone blat
if you want to use masking options. Many users for
untranslated don't even bother with explicit masking.
There is still some "masking" that occurs by virtue
of overused tiles being ignored and thus unable
to initiate a hit.
Of course, you could hard-mask your target database I suppose,
but that's probably unnecessary.
It was unclear to me why you were saying that
you needed gfServer instead of standalone blat
to run many queries. You can have an enormous
set of queries with standalone blat.
Just make a list
blat database query [-ooc=11.ooc] output.psl
[...]
where:
database and query are each either a .fa , .nib or .2bit file,
or a list these files one file name per line.
[...]
You can stick all your queries in one huge .2bit or .fa,
or you can simply create a file that lists all the
sequence-files that you want to query:
somefile1.2bit
somefile2.2bit
someotherfile.fa
someotherfile2.fa
At UCSC, we mainly use gfServer with hgBlat for online interactive
queries, and we frequently use standalone blat with no masking options.
Some steps of our automated genbank process do use standalone
blat with masking options.
Thanks for pointing out another quirky difference between blat and
gfServer/gfClient.
-Galt
On Tue, 14 Jul 2009, Brant Faircloth wrote:
> Hi,
>
> note: apologies in advance if this gets duplicated. It didn't post
> after a day, and I figured it may have been blocked due to my pgp sig
> attachment.
>
> First, i just wanted to say thanks for the mailing list and to thank
> everyone for their work on the source tree - it's a great resource
> that I use almost daily! I've browsed the list for quite some time,
> but have recently run across some strangeness in the behavior of
> gfClient relative to blat. Likely, the strangeness is of my own
> doing, but I figured I might email to see if that, indeed, was the case.
>
> I'm working from gfClient/Server (v.34x4) and blat (v. 34x4) compiled
> from CVS. The problem I'm running into deals with alignments starting
> in repeat regions (versus alignments extending over repeats). Here
> are my gfServer start parameters:
>
> /Users/bcf/bin/i386/gfserver start 127.0.0.1 8888 /Users/bcf/Data/test/
> SoftMask/*.softmask.2bit -mask
>
> where *.softmask.2bit was created from a fasta file of soft-masked
> sequences (from repeatmasker | `maskOutFa -soft`) using faToTwobit.
> these targets also contain the query sequence I am demonstrating
> with. I am running gfServer because the number of queries for what I
> am attempting is large, and I would prefer to avoid reindexing the
> 2bit file with every call to blat.
>
> my query with gfClient is:
>
> /Users/bcf/bin/i386/gfclient -t=DNA -q=DNA -minScore=0 -minIdentity=0 -
> out=psl 127.0.0.1 8888 / ~/tmp/tmp.fa stdout
>
> where tmp.fa is a single, soft-masked sequence in fasta format.
> tmp.fa has a soft-masked repeat region, extending from position 76-158
> (0-indexed). The (truncated) gfClient output is:
>
> match mis- rep. N's Q gap Q gap T gap T gap strand Q
> Q
> Q Q T T T T block blockSizes
> qStarts tStarts
> match match count bases count bases name
> size
> start end name size start end count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 100 2 0 1 0 0 2 2 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02D3DFI 250 30
> 135 3 18,4,81, 0,18,22, 30,49,54,
> 99 2 0 1 1 1 3 5 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DA9YF 222 102
> 209 4 18,4,68,12, 0,18,22,91, 102,121,126,197,
> 94 2 0 0 1 7 1 9 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DAKZ3 297 23 128
> 2 12,84, 0,19, 23,44,
> 100 2 0 1 0 0 4 12 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DBSW8 222 102
> 217 5 18,4,17,51,13, 0,18,22,39,90, 102,121,126,144,204,
> 55 1 0 0 0 0 0 0 -
> FX5ZTWB02D5UGZ 179 76 132 FX5ZTWB02DBYD5 226 39 95
> 1 56, 47, 39,
> 67 1 0 0 0 0 0 0 -
> FX5ZTWB02D5UGZ 179 76 144 FX5ZTWB02DJ4YU 231 96 164
> 1 68, 35, 96,
> 100 2 0 1 0 0 2 2 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DJB25 170 29
> 134 3 18,4,81, 0,18,22, 29,48,53,
> 100 2 0 1 0 0 2 2 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02DVMEF 168 29
> 134 3 18,4,81, 0,18,22, 29,48,53,
> 79 0 0 0 1 3 3 19 -
> FX5ZTWB02D5UGZ 179 76 158 FX5ZTWB02DWVVC 241 64
> 162 4 13,28,15,23, 21,34,65,80, 64,94,123,139,
> 94 2 0 0 1 7 1 9 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02EGVMB 247 23 128
> 2 12,84, 0,19, 23,44,
> 100 2 0 1 0 0 2 2 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02EHBES 338 39
> 144 3 18,4,81, 0,18,22, 39,58,63,
> 44 0 0 0 0 0 1 1 -
> FX5ZTWB02D5UGZ 179 76 120 FX5ZTWB02EOC38 213 66 111
> 2 28,16, 59,87, 66,95,
> 100 2 0 1 0 0 2 2 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02ETWES 202 29
> 134 3 18,4,81, 0,18,22, 29,48,53,
> 100 2 0 1 0 0 1 1 -
> FX5ZTWB02D5UGZ 179 76 179 FX5ZTWB02EZOW2 208 19
> 123 2 18,85, 0,18, 19,38,
>
> A blat run of the form:
>
> blat /Users/bcf/Data/test/SoftMask/*.softmask.clean.2bit tmp.fa -
> mask=lower stdout
>
> returns (full output):
>
> match mis- rep. N's Q gap Q gap T gap T gap strand Q
> Q
> Q Q T T T T block blockSizes
> qStarts tStarts
> match match count bases count bases name
> size
> start end name size start end count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 31 2 80 0 0 0 1 24 +
> FX5ZTWB02D5UGZ 179 45 158 FX5ZTWB02EZZ23 182 35
> 172 2 31,82, 45,76, 35,90,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EMMWO 294 35 67
> 1
> 32, 45, 35,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EM5LP 153 35 67
> 1
> 32, 45, 35,
> 32 0 33 0 1 1 1 23 +
> FX5ZTWB02D5UGZ 179 45 111 FX5ZTWB02ELBHJ 161 34
> 122 2 32,33, 45,78, 34,89,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EKORL 159 36 68
> 1
> 32, 45, 36,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EJB29 138 35 67
> 1
> 32, 45, 35,
> 66 2 0 0 1 8 2 26 +
> FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02EJ0PM 301 0 94
> 3
> 11,26,31, 0,11,45, 0,12,63,
> 68 1 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 0 77 FX5ZTWB02EICDX 381 229 322
> 2 37,32, 0,45, 229,290,
> 66 2 0 0 1 8 2 25 +
> FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02EH3VT 247 0 93
> 3
> 11,26,31, 0,11,45, 0,12,62,
> 62 1 0 0 2 13 2 35 +
> FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02EGNY4 328 0 98
> 3
> 24,8,31, 0,29,45, 0,32,67,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02EG2T5 198 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02ECX8Z 224 0 67
> 2
> 11,32, 26,45, 0,35,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02EC10O 167 35 67
> 1
> 32, 45, 35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02EBHWR 212 0 67
> 2
> 11,32, 26,45, 0,35,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DYAKJ 141 35 67
> 1
> 32, 45, 35,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DUG6S 181 35 67
> 1
> 32, 45, 35,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DTP6B 182 35 67
> 1
> 32, 45, 35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DSULL 245 0 67
> 2
> 11,32, 26,45, 0,35,
> 44 0 37 0 1 8 2 55 +
> FX5ZTWB02D5UGZ 179 26 115 FX5ZTWB02DPKMK 206 0
> 136 3 11,32,38, 26,45,77, 0,35,98,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DPI46 179 35 67
> 1
> 32, 45, 35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DNZB3 290 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DMW8R 211 0 67
> 2
> 11,32, 26,45, 0,35,
> 40 0 0 0 1 8 2 27 +
> FX5ZTWB02D5UGZ 179 26 74 FX5ZTWB02DMQ4E 240 0 67
> 3
> 11,5,24, 26,45,50, 0,37,43,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DKKJB 175 35 67
> 1
> 32, 45, 35,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02DI8TE 158 35 67
> 1
> 32, 45, 35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DGCU6 275 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02DB9V0 286 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 2 78 0 2 9 2 47 +
> FX5ZTWB02D5UGZ 179 26 158 FX5ZTWB02D92EW 204 0
> 170 3 11,32,80, 26,45,78, 0,35,90,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D8YAV 238 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D8RCP 216 0 67
> 2
> 11,32, 26,45, 0,35,
> 95 0 82 1 1 1 2 4 +
> FX5ZTWB02D5UGZ 179 0 179 FX5ZTWB02D887O 221 0 182
> 3 11,149,18, 0,11,161, 0,12,164,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D83KC 250 0 67
> 2
> 11,32, 26,45, 0,35,
> 96 0 82 1 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 0 179 FX5ZTWB02D5UGZ 179 0 179
> 1 179, 0, 0,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D5NGS 270 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D35EE 194 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D1GE9 269 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D1EQS 198 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02D0168 201 0 67
> 2
> 11,32, 26,45, 0,35,
> 30 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 47 77 FX5ZTWB02C9X15 154 39 69
> 1
> 30, 47, 39,
> 32 0 0 0 0 0 0 0 +
> FX5ZTWB02D5UGZ 179 45 77 FX5ZTWB02C8WKA 194 35 67
> 1
> 32, 45, 35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02C7D4Y 222 0 67
> 2
> 11,32, 26,45, 0,35,
> 43 0 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 26 77 FX5ZTWB02C6UVQ 263 0 67
> 2
> 11,32, 26,45, 0,35,
> 66 2 0 0 1 8 2 25 +
> FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02C5OY2 249 0 93
> 3
> 11,26,31, 0,11,45, 0,12,62,
> 66 2 0 0 1 8 1 24 +
> FX5ZTWB02D5UGZ 179 0 76 FX5ZTWB02C1LC0 179 0 92
> 2
> 37,31, 0,45, 0,61,
>
>
> It looks like blat is treating the masking correctly in alignments -
> there are no alignments starting in the repeat region (76-158) of the
> Query or the Targets. Alignments across masked regions that begin in
> unmasked regions are treated as expected (i,e. the self to self (Q=
> FX5ZTWB02D5UGZ to T= FX5ZTWB02D5UGZ) alignment extends through the
> masked region).
>
> Conversely, in the truncated gfClient output, several of the
> alignments listed have a `Q start` to `Q end` within 76-158, which is
> an unexpected result given the use of the `-mask` flag to start an
> instance of gfServer with the soft-masked, 2bit input file. After
> double-checking the associated Target sequence (and reverse complement
> of the Target) for masked bases, it appears alignments are started in
> repeat-masked regions of these targets.
>
> I noticed the gfServer help indicated that the mask option is to be
> used with nib files, but I assumed since 2bit files were also a valid
> input option (and can be composed of multiple fastas, which I need),
> the `-mask` option applied, as well. So, the discrepancy in the
> output from blat versus gfClient is what has me confused. Again, I
> suspect that I've got something wrong here, that my interpretation of
> the expected behavior is incorrect, or that the help is indeed correct
> that gfServer masking is nib only, but I can't quite put my finger on
> the problem.
>
> Thanks for your time,
> brant
>
> ************************************************
> Brant C. Faircloth
> Dept. of Ecology and Evolutionary Biology
> 621 Charles E. Young Drive South
> University of California
> Los Angeles, CA 90095 USA
>
> rooms: LSS 4304 and 4315
> email: [email protected]
> lab: +1.310.206.2270
> office: +1.310.206.3083
> mobile: +1.706.201.6110
> ************************************************
>
> < * )
> (_ \\
> _ ||
>
>
>
>
>
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome