Hi all, The combination of crowd-funding and contract work for me, and methods + software development by Mahendra Awale, has resulted in a new version of mmpdb.
More specifically, version 3.0 beta 1 is available on GitHub at: https://github.com/adalke/mmpdb/tree/v3-dev The CHANGELOG summary is at the bottom of the email. For many people the biggest improvement is probably the support for large data sets, and the switch to a more human-understandable SMARTS/"pseudo-SMILES" for the environment fingerprints. The documentation is available through the program, starting with 'mmpdb --help'. Try it out, kick the tires, and let me know what fell off! Cheers, Andrew da...@dalkescientific.com A large number of changes to merge three different development tracks and add new features. The "fragments" file format has been replaced with a SQLite-based "fragdb" file format. This makes it much easier to develop tools to work on fragment data sets instead of processing a JSON-Lines file. New functionality to create an MMP data set in a distributed compute environment. Some of the features are: - split a SMILES file into a set of smaller SMILES files - the default "fragment" file output is now based on the input name - fragment files can be re-partitioned by constant fragments: - the "fragdb_constants" file generates fragment information - the "fragdb_partition" create re-partitioned fragdb files - the default "index" file output is now based on the input name - there are tools to merge fragdb and mmpdb files into one As a result, mmpdb can now handle significantly larger data sets. Added support for Postgres for direct index database creation. (The new distributed compute tools require SQLite.) Added a new "generate" command to apply 1-cut transforms to a structure, using MMP rules as a playbook. Replaced the SHA256-based Morgan fingerprint signature with a canonical SMARTS representing the Morgan fingerprint environment. This is difficult to understand or depict, so also include a "pseudo" SMILES that can be parsed by RDKit (if sanitize is disabled) and drawn. The new environment fingerprint also include the SMARTS of its parent, that is, the SMARTS with a smaller radius. Switched to 'click' for command-line parsing, removed the vendered version of the peewee ORM, and switched to a modern "pyproject.toml" project configuration with a setup.cfg which declares its dependencies. _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss