Hi,
I was trying to find something like the prosite rules database that may be 
include more conserved domains.
That is, I've got a bunch of short peptides and I want to determine if any of 
them have functional
significance. I would imagine that function prediction servers may have such 
database but probably
not in downloadable form. In particular, I took about 3000 short sequences that 
have something to
do with cell cycle arrest ( eutilsnew is my own script but you get the idea), 
 
eutilsnew -protein -v -out stuff '"cell cycle" arrest'
$progpath/file_parsing -fastas stuff  stuff_fasta

I have a way to get the most frequently occuring short strings. In this case, I 
got some interesting hits,
( and also found out that "M" occurs at the start quite often, adding some 
confidence that the code is running
properly...) 

 $progpath/string_test -fastas stuff_fasta  -status -conserved | grep [A-Z] | 
sort -g -r -k 2 > cca_roots
$ head cca_roots
   M 2321
PENL 565
   L 545
FENL 461
YENL 458
   F 456
   W 455
WENL 454
  MS 425
RSPS 396


In any case, I wanted to see if the regular expression [PFYW]ENL means anything.
First, I did get a control group,
( only got the first 1500 and used ctrl-c to "select" the first few),

eutilsnew -v -protein -out some_hydo "hydroxylase"


$ head hydro_roots
   M 1418
GDAA 312
GAGL 308
DAAH 299
AGLL 283
GLLS 266
IGLA 263
PVAG 258
LLSS 253
AGQG 253


The prosite rule list that I have shows some "ENL" candidates explicitly( non 
of which
include PWY or W as a leading acid )  and maybe more that
are more cryptic,

$ grep ENL /cygdrive/c/mydocs/scripts/cc/affx/prosite_rules
P.{2}[LIVMF]{2}[LIVMS].[GDN].{3}[DENL].{3}[LIVM].E.{4}[GNQKRH][LIVM][AP]>rule|216|PEPDTIDE
 Prosite RIBOSOMAL_S2_2
K[LIVMF]DG[LIVMAS][SAG].{4}Y.{2}[GRD].[LF].{4}[ST]RG[DN]G.{2}G[DE][DENL]>rule|832|PEPDTIDE
 Prosite DNA_LIGASE_N1
CC[SHYN].{0,1}[PRG][RPATV]C[ARMFTNHG].{0,4}[QWHDGENLFYVP][RIVYLGSDW]C>rule|1104 
|PEPDTIDE Prosite ALPHA_CONOTOXIN

but that is all I have to go on. I did a quick look at NCBI CDART and related 
pfam resources but
couldn't figure out how to download anything useful.  I couldn't immediately 
get blast to return
any hits on "ENL" and I'm not sure what all parameters I'd need to tweak to 
search on short things.

Thanks.




Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
[EMAIL PROTECTED]
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note: Hotmail is possibly blocking my mom's entire
ISP - try  me on [EMAIL PROTECTED] if no reply
here. Thanks.


_________________________________________________________________
Time for vacation? WIN what you need- enter now!
http://www.gowindowslive.com/summergiveaway/?ocid=tag_jlyhm
_______________________________________________
BBB mailing list
[email protected]
http://www.bioinformatics.org/mailman/listinfo/bbb

Reply via email to