[EMBOSS] EMBOSS 6.0.0 released

ajb Tue, 15 Jul 2008 10:55:04 -0700

EMBOSS 6.0.0 is now available from:

  ftp://emboss.open-bio.org/pub/EMBOSS/EMBOSS-6.0.0.tar.gz


The associated EMBASSY packages are in the same directory. Note that,
as usual, these are specific to the main package so versions downloaded
for a previous release will not work with 6.0.0.

Changes in 6.0.0 include new applications, improvement of existing
applications, library API consistency changes, bugfixes etc. Most are
described in the relevant section of the ChangeLog which is reproduced
below.


mEMBOSS-6.0.0 is available from:

  ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.0.0-setup.exe

mEMBOSS contains all the EMBOSS changes plus improvements and bugfixes
for the GUI (Jemboss). Also, this release of mEMBOSS contains the C runtime
library files; these had to be installed separately in previous
versions.

Alan





Version 6.0.0
        New application aligncopy reads a set of aligned sequences and
        prints a report in one of the standard alignment formats that can
        accept the same number of sequences. Pairwise alignment formats
        can only be used if the input has exactly two sequences.

        New application aligncopypair reads a set of aigned sequences and
        prints a report or each pair of aligned sequences in one of the
        standard alignment formats.

        New application featreport reads a sequence and a feature table,
        and writes a report in and of the standard report formats.

        New application featcopy reads and writes a feature table to
        convert feature formats.

        New applications maskambignuc and maskambigprot replace ambiguity
        characters in nucleotide sequences with 'N' and in protein
        sequences with 'X'.

        New application consambig reports an alignment consensus sequence
        using ambiguity characters. The intended use cases are sequencing
        reads and SNP reporting.

        New application sizeseq sorts sequences in ascending or descending
        order of length. This is a port of the application seqsort from
        the domsearch EMBASSY package.

        New application skipredundant uses pairwise sequence matches to
        exclude sequences that are similar from an input set. This is a
        modified version of the application seqnr from the domsearch
        EMBASSY package.

        New applications provide utility functions for former GCG users:
        nohtml removes HTML tags, notab replaces tabs with spaces,
        nospace removes all whitespace from a file, skipspace removes
        extra whitespace from a file.

        Older EMBOSS applications can now generate a warning message
        stating that they are marked as 'obsolete' with an explanation and
        an indication of alternative programs in EMBOSS or in an EMBASSY
        package. This warning can be turned off by defining environment
        variable EMBOSS_WARNOBSOLETE with a value of "N" or by defining
        the same variable in the emboss.defaults or ~/.embossrc files. We
        will begin to mark applications as 'obsolete' in future releases.

        A new EMBASSY package "myembossdemo" contains the demonstration
        applications demoalign, demofeatures, demolist, demoreport,
        demosequence, demostring, demostringnew and demotable that
        illustrate how to use EMBOSS data types in your own
        applications. The myembossdemo package allows novice developers to
        try simple EMBOSS programming. The myemboss package is available
        for adding your own applications. The demo applications are no
        longer distributed with the main EMBOSS package. They were not
        installed and were only built with the "make check" option.

        Application short descriptions have been revised. The minimum
        length of application one line descriptions is increased from 60
        to 70 characters. The descriptions are easier to write. Output
        from wossname can now be 90 characters wide. Interfaces that use
        the description in menus may need to allow some extra space.

        Function names in ajfile.c have been standardised. Old names are
        still accepted but are marked as "deprecated" and will generate
        warnings with the gcc compiler (see ajstr below). Other compilers
        will see no difference. New source files ajfiledata.c and
        ajfileio.c have been added. The buffered file data structures are
        renamed internally to be more consistent (AjPFileBuff to AjPFilebuff).

        notseq was unable to search for IDs containing '|' characters
        but uses string matching (not regular expressions) and these
        characters are valid in NCBI-style FASTA files if read with the
        "pearson" format which accepts the whole ID string without parsing.

        The sequence alignment code has been updated. Sequence alignments
        with low gap penalties failed to allow two gaps (one in each
        sequence) without a match in between. The embAlign functions are
        now simplified. Scores are returned by the PathCalc functions. The
        Walk functions that walk through the path and return the aligned
        sequences are faster and need fewer parameters. Profile alignments
        occasionally duplicated residues in the sequence around gap
        positions. Fast alignments around a limited width include
        additional residues at each end and require an offset rather than
        separate start positions. The offset if the difference between the
        two start positions used in 5.0.0 and earlier releases.

        Eprimer3 citations are corrected in the help text (from the ACD
        file) and in the documentation. The citation errors were traced to
        the original primer3_core documentation which has now been
        corrected.

        Wordmatch could confuse overlapping matches. It occasionally
        extended the wrong match and missed a corresponding new match.

        Seqmatchall results were correct with the default output
        format which reports match positions, but gave incorrect results
        with some other local alignment formats that include the sequence.
        Seqmatchall now stores alignments in the same way as other local
        alignment applications, and the alignment internals are corrected
        to ensure other applictaiopns will not have the same problem.

        Emma was officially supporting clustalw 1.83. Issues with clustalw
        2.0 are now resolved and this version is supported if clustalw2 is
        installed. Emma executes an applications called clustalw (not
        clustalw2) so version 2.0 must be installed under this name or an
        environment variable EMBOSS_CLUSTALW needs to be defined to point
        to the executable clustalw2 file.

        Sequence format "selex" allows invalid sequence data files to be
        accepted as input. Selex format is still available but is no
        longer included in the formats that can be automatically
        detected. When reading selex format data, users need to put
        "-sformat selex" on the command line, or specify "selex::" at the
        from of the USA. See the HMMER (old version EMBASSY package)
        documentation for examples. HMMERNEW (recommended) examples use
        Stockholm format and so are unchanged.

        Program dbxfasta now defaults to a filename of "*.fasta"
        The previous default "*.dat" is not commonly used for FASTA format
        databases.

        Program msbar block mutations were 1 longer than the specified
        block and may crash if the block size was fixed (minimum and
        maximum block sizes the same). This off-by-one error is now
        corrected.

        In GenBank output format, multiple line KEYWORD sections were not
        formatted correctly.

        ACD list and select values (the menus that appear in the user
        prompt) can now have ACD variables. Although useful for local
        application development these are not used in EMBOSS distributed
        ACD files because the variables are difficult for web and GUI
        interfaces to resolve when presenting the menu text.

        List and Table internal data structures are now cached so that
        creating and deleting temporary lists and tables is more efficient.

        In emboss.default database definitions the filename and exclude
        values can be delimited by spaces, commas or semicolons. Previous
        releases used only spaces. Parsing is now consistent with the
        fields definition which allowed all the above characters.

        Protein sequences with pyrrolysine ('O') had 'O' converted to a
        gap because this was a gap character in early versions of
        Phylip. This was patched in 5.0.0 to allow 'O' in UniProt release
        13. The gap character is upper case only, so 'o' was correctly
        read as pyrrolysine.

        Wordfinder used the same descriptions for two pairs of qualifiers.
        The descriptions are changed to make their meaning clear in
        commandline help and in web interfaces.

        New function ajTimeDiff returns the difference in seconds between
        two time values.

        Profiling tests showed that file reading and string handling can
        be made faster. String handling called functions many levels
        deep. Making this code inline and using macro versions improved
        performance for applications (e.g. database indexing) that use
        many string calls. File input requires each input line to be
        copied. Using copy-by-reference (ajStrAssignRef) often makes this
        more efficient. Existing macros now test for undefined strings:
        MAJSTRGETLEN, MAJSTRGETPTR, MAJSTRGETRES and MAJSTRGETUSE. New
        macros are added for string handling: MAJSTRDEL,
        MAJSTRGETUNIQUESTR, MAJSTRCMPC and MAJSTRCMPS.

        Memory management includes new macros AJCRESIZE0 and AJRESIZE0
        provide resize functions that guarantee new memory is set to
        zero. The functions must be given the original allocated size.

        Using the GNU C run-time library, calls to mcheck and mprobe are
        available to test for memory corruption by examining the bytes
        before and after an address allocated by malloc. This can be
        turned on for any application, including Unix commands, with the
        environment variable MALLOC_CHECK_ which has values 0, 1, 2 or
        3. 1 writes to standard error when a problem is found, 2 aborts
        the programs, 3 does both and 0 ignores errors. No recompilation
        is needed for this simple method. EMBOSS now has a ./configure
        option --enable-mprobe which enables two new
        functions. ajMemProbe, passed an address from malloc (AJNEW0,
        AJCNEW0, etc.) tests the bytes before and after and reports any
        errors. The advantage of using ajMemProbe rather than mprobe is
        that a macro MAJMEMPROBE also reports the file and line number
        where ist was called. To avoid large numbers of messages (when
        code has problems) a limit can be set with ajMemCheckSetLimit
        after which the program will exit. Note that enable-mprobe is
        incompatible with using valgrind to test for memory leaks - as
        mprobe and mcheck have to look at illegal bytes before and after
        allocated memory blocks. Memory checking is turned on by a call to
        mcheck, passing the function ajMemCheck, in ajnam.c before the
        first memory allocation. If any program calls malloc before
        calling embInit or embInitP this call will fail and issue a
        warning (if compiled with --enable-mprobe). A special call
        ajStrProbe tests any string with mprobe. Special calls ajListProbe
        and ajListProbeData test lists and their contents. For more
        details see http://www.gnu.org/software/libc/manual/

        Protein sequences from the Staden package were read as nucleotide
        because they were missing information on the ID line to identify
        EMBL of SWISSPROT format. The sequences are now tested and
        correctly typed.

        Wordcount now accepts protein sequences as input. Previous
        releases only allowed nucleotide sequences.

        Wordfinder options had the same information prompt. These have
        been changed from "limit" to "minimum" and "maximum" to make their
        function clear.

        Prompting for values from the user now includes a test for
        standard input in use as an input file. If standard input is open,
        the default response is accepted and a message is written to the
        user. This is to avoid problems with command lines that use
        "stdin" as an input and do not include -auto.

        The acdpretty utility can now preserve comments in ACD files.
        Comments are maintained in blocks with blank lines before and
        after. Inline comments are started in column 50 unless they are
        exceptionally long. Comments themselves have white space cleaned
        up but otherwise are not reformatted.

        A new function ajAcdGetValueDefault is added to return the default
        value of an ACD qualifier. This can be combined with
        ajAcdIsUserdefined in wrappers to test for values changed by the
        user.

        Infile qualifiers in ACD have a new attribute "trydefault" which
        allows the default filename to fail. Any filename provided by the
        user has to exist. This was added to support the behaviour of the
        MIRA EMBASSY package. To allow an infile to fail the attribute
        "nullok" also must be set to "Y"

        Applications which produce an output file or graphics often
        created an empty output file when the plot was selected.
        The ACD files have been corrected to only create the file if it
        will be written to. Applications changed are charge, dan,
        freak, hmoment, iep and tcode.

        Whichdb only writes to its output file if -get is false.
        With -get it creates sequences. The outfile is no longer created
        when whichdb is in -get mode.

        String functions corrected so that Case in the name always means
        case-insensitive and works by converting to upper case. Some
        functions were defined the wrong way, with "Case" for the
        case-insensitive form.

        GFF3 format is now the default feature output.

        A new function ajFeatIsCds identifies protein coding nucleotide
        features (CDS) using the SO identifier. A new function
        ajFeattagIsNote identifies feature tags that are for the default
        feature tag.

        Protein features now use the new Sequence Ontology terms defined
        by BioSapiens. These are not yet accepted by GFF3 validators. The
        new SO identifiers are added to protein feature definitions and
        used internally.

        Feature format definitions (the Efeatures and Etags files)
        now allow #include references to other files. This allows a
        standard EMBL and Swissprot feature table definition to be
        included by the internal and GFF definitions. Redefinitions are
        allowed using + and - prefxes to add and remove tags for existing
        feature types.

        GFF3 format feature (and report) output is added.

        A new application "density" has been added. This reports the
        A+C+G+T and AT+GC densities of nucleic acid sequences within
        an adjustable sliding window. Plots of A+C+G+T or AT+GC are
        optionally produced.

        Molecular weight programs (e.g. digest, mowse) now have a
        -mono switch to allow use of monoisotopic weights.
        By default, average molecular weights are used.

        The Eamino.dat format has changed. Molecular weight information
        has been removed and put in its own Emolwt.dat file. This latter
        now allows specification of average and monoisotopic weights. Values
        for hydrogen and oxygen are specified as well as the amino acid weights.

        The library representation of amino acid property information
        has been changed. The EmbPropTable global table has been
        removed and replaced with EmbPPropAmino and EmbPPropMolwt objects.

        Pepcoil now produces a report (replacing a text output) in "motif"
        format. The default is changed to not report non coiled-coil
        regions as they are hard to distinguish in this format.

        The "motif" report format is extended to allow two score positions
        marked with "*" and "+" and labelled internally as "pos" and
        "pos2". No application uses pos2 (it was added for pepcoil, but
        both score maximum positions are always the same)

        A new function ajAcdIsUserdefined allows wrappers to test which
        qualifiers have values changed by the user so that they can use
        shorter command lines to launch the wrapped application.

        jaspscan application added. Scans sequences for transcription
        factors using the JASPAR matrices.

        jaspextract application added to move the JASPAR matrices into the
        EMBOSS data area subdirectories.

        Alignment format "trace" used to display internal data content, is
        renamed to "debug" to be consisten with other formats. A "debug"
        format is added for feature output.

        Application documentation has been updated to remove obsolete
        references to EMBL database identifiers. These are replaced with
        the correct accession numbers.

        Two new entries have been added to the "tembl" test EMBL database
        for use in the QA tests.

        Report output now checks the sequence and feature table type. Is
        the sequence is not a valid protein, protein-only formats (pir,
        swiss) will fail with an error message. Similarly, if the sequence
        is not a valid nucleotide sequence then nucleotide-only formats
        (embl, genbank) will fail with an error message.

        Garnier now uses the correct SwissProt and internal feature keys
        for protein secondary structure. The results will appear much
        better for example as a swissprot feature table. This required
        rewriting of the internals by recoding the secondary structure
        features with a "garnier" tag replacing the previous "helix",
        "sheet", "turns" and "coil" tags. The default output is
        unchanged. The results in other report formats will be changed.

        Silent no longer reports the "Dir" column. This is replaced by the
        new "Strand" column which reports "+" for a forward feature and
        "-" for a reverse feature.

        The following programs have changed default report output, with
        the strand included for nucleotide sequences: equicktandem,
        etandem, fuzznuc, fuzztran, recoder, restrict, silent, tcode,
        twofeat. The strand column can be removed with the new commandline
        associated qualifier -norstrandshow.

        Reports for nucleotide sequences have confusing ways to represent
        the start and end positions for features on the complementary
        strand. A strand column has been added to these reports,
        controlled by a new -rstrandshow qualifier and attribute. By
        default the strand is shown for all nucleotide reports (see a list
        of changed program outputs above). The start position is always
        lower than the end position for features on the complementary
        strand indicating the region that should be reversed. In past
        releases the seqtable report format (fuzznuc, dreg, dan)
        confusingly reversed start and end positions to indicate the
        unreported strand. For all report formats (nametable, table) the
        start and end positions are now consistent with nucleotide feature
        formats (gff, embl, genbank).

        Reports from dreg incorrectly reported sequences reversed with the
        -sreverse qualifier.

        Report headers now include the text "(Reversed)" when the input
        sequence(s) are reverse complemented.

        Phylogenetic trees in newick format are now parsed into internal
        trees and converted back for use by Phylip. This allows us to
        read other tree formats and pass them to Phylip (e.g. Nexus)

        Some ACD data types did not allow the input to be NULL because
        extra tests were carried out on the results. These are all cleaned
        up and tested so that they can safely be set to nullok and missing
        in local applications.

        New sequence reading formats for PDB files. By default the ATOM
        records are used (format "pdb"). An alternative format "pdbseq"
        will read the SEQRES records which give the original sequence. The
        ATOM records give the sequence determined from the structure.

        Improved the help text for the -stdout and -filter options to
        explain output files are written to standard output. Some users
        expected graphics output (from plplot) to be controlled.


_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

[EMBOSS] EMBOSS 6.0.0 released

Reply via email to