Okay,

I'm going to presume you want to search the data.. to retrieve similar
compounds or substructures. If not then just store the hexadecimal
fingerprint.

It's not impossible to do searching in MongoDB, see a talk from Matt Swain
<https://matt-swain.com/blog/2014-06-03-chemical-similarity-search-in-mongodb>,
... and my follow ups:
http://efficientbits.blogspot.com/2014/11/memory-mapped-fingerprint-index-part-i.html
,
http://efficientbits.blogspot.com/2014/12/memory-mapped-fingerprint-index-part-ii.html
.

However my view is (as I make clear in those blog posts) MongoDB is the
wrong technology for this, but you could convert your the binary
fingerprint to a vector. In fact to *toString* works well:

System.out.println(new
> Fingerprinter().getBitFingerprint(mol).asBitSet().toString());


{43, 46, 51, 60, 65, 70, 72, 86, 95, 99, 111, 114, 123, 128, 144, 157, 158,
161, 166, 174, 185, 188, 204, 213, 222, 253, 271, 275, 278, 311, 315, 320,
335, 364, 371, 379, 390, 409, 446, 449, 463, 486, 498, 520, 523, 535, 540,
565, 574, 586, 588, 611, 628, 632, 637, 647, 649, 655, 667, 725, 742, 756,
770, 793, 845, 859, 865, 918, 951, 954, 959, 1015}

You could then use and/or queries to find fingerprint subsets or computer
Tanimotos etc.

John

On Mon, 24 Feb 2020 at 13:44, Maria Sorokina <maria.ssorok...@gmail.com>
wrote:

> I see the problem.
>
> Well, originally, I wanted to checkout how the raw fingerprints look like.
> I am storing all the data (and the fingerprints) in MongoDB, and I am still
> not sure if in case I save the BitFingerprints directly in there (with is
> possible when the field has an Object type), if they will be parseable by
> the mongo engine as fingerprints (without retrieving them to be read with
> CDK). So this is why I wanted to check the raw fingerprints, as they should
> be more JSON-friendly format, and mongo engine would be able to read those
> integers and strings for further similarity search.
>
> Kind regards,
> Maria
>
>
> Dr. Maria Sorokina
> Steinbeck Research Group
> Analytical Chemistry - Cheminformatics and Chemometrics
> Friedrich-Schiller-University Jena, Germany
> http://cheminf.uni-jena.de
>
> Le 21 févr. 2020 à 19:31, John Mayfield <john.wilkinson...@gmail.com> a
> écrit :
>
> Okay looking at it the Substructure fingerprint would be easy to adapt...
> but it's not hard to just count the substructures. Utility code like that
> is difficult to justify, every line is more to maintain.
>
> The other problem is I don't like the fingerprint APIs so it's a toss-up
> between using effort to implement something I (or hopefully someone else)
> will ultimately rewrite in future. "Deprecated on arrival" I believe Egon
> has said before.
>
> On Fri, 21 Feb 2020 at 18:25, John Mayfield <john.wilkinson...@gmail.com>
> wrote:
>
>> What do you think the "raw" fingerprint is? Why would you expect it for
>> the Substructure one?
>>
>> On Fri, 21 Feb 2020 at 09:47, Maria Sorokina <maria.ssorok...@gmail.com>
>> wrote:
>>
>>> I tried in total 7 fingerprinters (PubChem, Substructure, MACCS,
>>> KlekotaRoth, Circular, ShortestPath and Hybrifization) and none worked. For
>>> some, I’m not surprised, but I was really expecting to have the raw
>>> fingerprints for the Substructure one
>>>
>>>
>>> Dr. Maria Sorokina
>>> Steinbeck Research Group
>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>> Friedrich-Schiller-University Jena, Germany
>>> http://cheminf.uni-jena.de
>>>
>>> Le 21 févr. 2020 à 10:39, John Mayfield <john.wilkinson...@gmail.com> a
>>> écrit :
>>>
>>> ... I do have some patches for an updated fingerprint API stack that
>>> would also add this in to more places. Essentially it was added to the
>>> public API but only implemented in a few places and left as a "ToDo"
>>> elsewhere. Might be something for the hack-a-thon.
>>>
>>> I should PubChem fingerprints are binary in nature though so you would
>>> probably never want the RAW version. *getBitFingerprint()* it
>>> implemented always.
>>>
>>> John
>>>
>>> On Fri, 21 Feb 2020 at 09:34, John Mayfield <john.wilkinson...@gmail.com>
>>> wrote:
>>>
>>>> Hi Maria,
>>>>
>>>> Not all fingerprint support the "RAW" option and Count options.
>>>>
>>>> John
>>>>
>>>> On Fri, 21 Feb 2020 at 09:31, Maria Sorokina <maria.ssorok...@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear community,
>>>>>
>>>>> It is decidedly substructure search and fingerprinting period of the
>>>>> year!
>>>>>
>>>>> I want to create (to store) raw fingerprints of a range of different
>>>>> fingerprint types for a big number of complex molecules (natural 
>>>>> products).
>>>>>
>>>>> For example this:
>>>>>
>>>>> PubchemFingerprinter pubchemFingerprinter = new PubchemFingerprinter( 
>>>>> SilentChemObjectBuilder.getInstance() );
>>>>>
>>>>> System.out.println(pubchemFingerprinter.getRawFingerprint(myAtomContainer));
>>>>>
>>>>> For all my molecules I am getting an" UnsupportedOperationException",
>>>>> which according to the documentation reflects only the fact that the 
>>>>> fingerprinter
>>>>> cannot produce the raw fingerprint.
>>>>> I am using the latest (2.3) version of the CDK.
>>>>> Can anybody help me with this issue?
>>>>>
>>>>>
>>>>> Kind regards,
>>>>> Maria
>>>>>
>>>>>
>>>>> Dr. Maria Sorokina
>>>>> Steinbeck Research Group
>>>>> Analytical Chemistry - Cheminformatics and Chemometrics
>>>>> Friedrich-Schiller-University Jena, Germany
>>>>> http://cheminf.uni-jena.de
>>>>>
>>>>> _______________________________________________
>>>>> Cdk-user mailing list
>>>>> Cdk-user@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>>>>
>>>>
>>>
>
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to