Hi Rajarshi, are you referring to the maximum index I use to set all bits in the BitSet or to the general usage of BitSets returned by the CDK Fingerprinters?
When I set all bits up the index determined by the getSize() method of the corresponding fingerprinter, cardinality() of the BitSet returns 166 for MACCS and 881 for Pubchem. I guess you answered questions (1) and (2). When I create a BitSet manually with 166 as argument for the constructor, size() returns 192. So the answer to question (3) seems to be "the first m ones" with m the value returned by the getSize() method of the fingerprinter. It would be very nice to have things like that documented in the Java documentation for these classes, because remarks like: ----------- Molecule molecule = new Molecule(); PubchemFingerprinter fprinter = new PubchemFingerprinter(); BitSet fingerprint = fprinter.getFingerprint(molecule); fingerprint.size(); // returns 881 fingerprint.length(); // returns the highest set bit ----------- as example from the documentation of the PubchemFingerprinter class are misleading in some cases. Thank you for your help. Kind regards Volker -- Volker Hähnke, Dipl.-Bioinf. Johann Wolfgang Goethe-University Frankfurt Chair for Chem- and Bioinformatics Beilstein-Endowed Chair for Cheminformatics Siesmayerstr. 70 60323 Frankfurt am Main, Germany Am 19.07.2010 um 13:58 schrieb Rajarshi Guha: > My understanding is that the value returned by size is implementation > dependant. The fact that it is larger than the expected size is due to the > fact that the 'extra' bits are present to support the dynamic expansion of > the object. You should use the getSize() method of the IFingerprinter object > to get the expectd length of the fingerprint > > On Jul 19, 2010, at 7:46 AM, Volker Hähnke wrote: > >> Hi Rajarshi, >> >> I looped over all bits in the BitSet returned from the Fingerprinter using >> .size() as maximum value and invoked the .set(int bitIndex) method for every >> position. >> >> Calling .cardinality() for the BitSet after this process returned 192 for >> MACCS and 896 for the Pubchem fingerprint. >> >> >> Kind regards >> Volker >> >> -- >> Volker Hähnke, Dipl.-Bioinf. >> Johann Wolfgang Goethe-University Frankfurt >> Chair for Chem- and Bioinformatics >> Beilstein-Endowed Chair for Cheminformatics >> Siesmayerstr. 70 >> 60323 Frankfurt am Main, Germany >> >> Am 19.07.2010 um 13:32 schrieb Rajarshi Guha: >> >>> If you set all the bits in say the MACCS fp BitSet to 1 and then call >>> cardinality(), what do you get? >>> >>> On Jul 19, 2010, at 7:21 AM, Volker Hähnke wrote: >>> >>>> Hey folks, >>>> >>>> three short questions about the fingerprints implemented in the CDK >>>> (version 1.3.0). >>>> >>>> The length of the BitSet created using the MACCS (Pubchem) Fingerprinter >>>> should be 166 (881). This is backed up by the corresponding getSize() >>>> method: It returns 166 (881) as length of the fingerprint the >>>> Fingerprinter calculates. >>>> >>>> But: Calculating fingerprints with these two Fingerprinters and invoking >>>> the size() method of the created BitSets returns 192 for MACCS and 896 for >>>> the Pubchem fingerprint, which is in conflict with intuition and >>>> documentation of the Fingerprinter classes. >>>> >>>> This behavior raises the following questions: >>>> 1) What happens here? >>>> 2) Where do the additional bits come from? >>>> 3) Which of the available bits are the "good" ones? >>>> >>>> Thanks in advance for your help. >>>> >>>> >>>> Kind regards >>>> Volker >>>> >>>> -- >>>> Volker Hähnke, Dipl.-Bioinf. >>>> Johann Wolfgang Goethe-University Frankfurt >>>> Chair for Chem- and Bioinformatics >>>> Beilstein-Endowed Chair for Cheminformatics >>>> Siesmayerstr. 70 >>>> 60323 Frankfurt am Main, Germany >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> This SF.net email is sponsored by Sprint >>>> What will you do first with EVO, the first 4G phone? >>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>> _______________________________________________ >>>> Cdk-user mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/cdk-user >>> >>> ---------------------------------------------------- >>> Rajarshi Guha | NIH Chemical Genomics Center >>> http://www.rguha.net | http://ncgc.nih.gov >>> ---------------------------------------------------- >>> Science kind of takes the fun out of the portent business. >>> -Hobbes >>> >>> >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Sprint >> What will you do first with EVO, the first 4G phone? >> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >> _______________________________________________ >> Cdk-user mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/cdk-user > > ---------------------------------------------------- > Rajarshi Guha | NIH Chemical Genomics Center > http://www.rguha.net | http://ncgc.nih.gov > ---------------------------------------------------- > A red sign on the door of a physics professor: > 'If this sign is blue, you're going too fast.' > > ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

