Hi all,

  The combination of crowd-funding and contract work for me, and methods + 
software development by Mahendra Awale, has resulted in a new version of mmpdb.

More specifically, version 3.0 beta 1 is available on GitHub at:

  https://github.com/adalke/mmpdb/tree/v3-dev

The CHANGELOG summary is at the bottom of the email. For many people the 
biggest improvement is probably the support for large data sets, and the switch 
to a more human-understandable SMARTS/"pseudo-SMILES" for the environment 
fingerprints.

The documentation is available through the program, starting with 'mmpdb 
--help'.

Try it out, kick the tires, and let me know what fell off!

Cheers,

                                Andrew
                                da...@dalkescientific.com


A large number of changes to merge three different development tracks
and add new features.

The "fragments" file format has been replaced with a SQLite-based
"fragdb" file format. This makes it much easier to develop tools to
work on fragment data sets instead of processing a JSON-Lines file.

New functionality to create an MMP data set in a distributed compute
environment. Some of the features are:

- split a SMILES file into a set of smaller SMILES files
- the default "fragment" file output is now based on the input name
- fragment files can be re-partitioned by constant fragments:
    - the "fragdb_constants" file generates fragment information
    - the "fragdb_partition" create re-partitioned fragdb files
- the default "index" file output is now based on the input name
- there are tools to merge fragdb and mmpdb files into one

As a result, mmpdb can now handle significantly larger data sets.

Added support for Postgres for direct index database creation. (The
new distributed compute tools require SQLite.)

Added a new "generate" command to apply 1-cut transforms to a
structure, using MMP rules as a playbook.

Replaced the SHA256-based Morgan fingerprint signature with a
canonical SMARTS representing the Morgan fingerprint environment. This
is difficult to understand or depict, so also include a "pseudo"
SMILES that can be parsed by RDKit (if sanitize is disabled) and
drawn. The new environment fingerprint also include the SMARTS of its
parent, that is, the SMARTS with a smaller radius.

Switched to 'click' for command-line parsing, removed the vendered
version of the peewee ORM, and switched to a modern "pyproject.toml"
project configuration with a setup.cfg which declares its dependencies.

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to