Hi Jean, I copied this reply to the list - as it includes poorly documented features and some suggestions for the future.
> It's great to know it can be done! I do have further questions. So in the > pattern file that has no name and contains two lines, you said it's going to > default to pattern 1. Does that means that without the '>', everything will > be concatenated and treated as one pattern? Yes. We did include a -pformat qualifier to set the format of the pattern file, so we can extend in future to have one pattern per line. Actually I should ask what's the difference between > >> pat2 <mismatch=1> > cg(2)c(3)taac > cctagc(3)ta > > and > >> pat2 <mismatch=1> > cg(2)c(3)taaccctagc(3)ta They are the same - pattern lines are simply joined together until the next new pattern header (>pat3) is found. > also what's the difference between a file containing >> pat2 <mismatch=1> > cg(2)c(3)taac > cctagc(3)ta > with a file containing > cg(2)c(3)taac > cctagc(3)ta The first allows one mismatch in matching the pattern. These patterns for with the HHTETRA entry we use for the example in the program manual (accession number L46634) >HHTETRA L46634.1 Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. aagcttaaactgaggtcacacacgactttaattacggcaacgcaacagctgtaagctgca ggaaagatacgatcgtaagcaaatgtagtcctacaatcaagcgaggttgtagacgttacc tacaatgaactacacctctaagcataacctgtcgggcacagtgagacacgcagccgtaaa ttcaaaactcaacccaaaccgaagtctaagtctcaccctaatcgtaacagtaaccctaca actctaatcctagtccgtaaccgtaaccccaatcctagcccttagccctaaccctagccc taaccctagctctaaccttagctctaactctgaccctaggcctaaccctaagcctaaccc taaccgtagctctaagtttaaccctaaccctaaccctaaccatgaccctgaccctaaccc tagggctgcggccctaaccctagccctaaccctaaccctaatcctaatcctagccctaac cctagggctgcggccctaaccctagccctaaccctaaccctaaccctagggctgcggccc taaccctaaccctagggctgcggcccgaaccctaaccctaaccctaaccctaaccctagg gctgcggccctaaccctaaccctagggctgcggccctaaccctaaccctagggctgcggc ccgaaccctaaccctaaccctaaccctagggctgcggccctaaccctaaccctagggctg cggccctaaccctaaccctaactctagggctgcggccctaaccctaaccctaaccctaac cctagggctgcggcccgaaccctagccctaaccctaaccctgaccctgaccctaacccta accctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta accctaaccctaaccctaaccctaaccccgcccccactggcagccaatgtcttgtaatgc cttcaaggcactttttctgcgagccgcgcgcagcactcagtgaaaaacaagtttgtgcac gagaaagacgctgccaaaccgcagctgcagcatgaaggctgagtgcacaattttggcttt agtcccataaaggcgcggcttcccgtagagtagaaaaccgcagcgcggcgcacagagcga aggcagcggctttcagactgtttgccaagcgcagtctgcatcttaccaatgatgatcgca agcaagaaaaatgttctttcttagcatatgcgtggttaatcctgttgtggtcatcactaa gttttcaagctt > Also could you explain how to use -pname and -pmismatch? >I don't understand this part at all :-P Thank you very much! Ah ... they are associated qualifiers (like -sformat, sbegin, send for sequences, -osformat for sequence output, -aformat for alignments and -rformat for reports. They only show up if you use -help -verbose to see the help. This caused some problems for fuzznuc users with release 4.0.0 as they replace the previous version which had a -mismatch option and only read one pattern. -pmismatch sets a default number of mismatches for all patterns (that you can override within the pattern file). -pname sets a pattern name for the output (something that was missing before). Oops, we have a bug ... the name is being ignored in fuzznuc. Will be fixed in 4.1.0. -pformat sets the pattern file format - so far this is ignored so we have not documented pattern file format names. I think a file with one line for each pattern and numbering 1, 2, 3 added to the pattern name would be useful. We could call the formats "simple" (one line per pattern) and "fasta" (the current format with names) Oops, another bug. Using a bad pattern file name is not being caught. Fixed in 4.1.0 We also added files of regular expressions used by dreg and preg so you can also use them for pattern searched (it depends on whether you prefer prosite-style patterns or regular expressions - I find the prosite style for fuzznuc are much easier). We can use the same file formats for them. I have to check the original pattern file code from Henrikki Almusa to see whether we lost anything in the naming and formats. Hope that helps, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
