Apologies - I see you have the correct reference and my memory betrays me. On Thu, 5 Sep 2019, 06:55 Noel O'Boyle, <baoille...@gmail.com> wrote:
> Regarding Tanimoto, credit should be given to the original source, > whatever language it's written in. From memory, la distribution des flores > dans la zone alpine, I believe. > > On Wed, 4 Sep 2019, 20:21 Andrew Dalke, <da...@dalkescientific.com> wrote: > >> Here's another example of how it's important to know the clear goal of >> collecting such a list. >> >> One of the entries someone added to the spreadsheet is: >> Tanimoto, Taffee T. (17 Nov 1958). >> "An Elementary Mathematical theory of Classification and Prediction". >> Internal IBM Technical Report. 1957 (8?). >> >> I'm going to argue that it's not useful. >> >> >> This is likely present because it's the reason why we call Tanimoto >> "Tanimoto". >> >> However, I don't think it's worthwhile to point to that citation. Here's >> the history as I know it: >> >> The first paper to really use the Tanimoto for similarity search was: >> >> Willett, P.; Winterman, V.; Bawden, D. Implementation of Nearest-Neighbor >> Searching in an Online Chemical Structure Search System. Journal of >> Chemical Information and Computer Sciences 1986, 26 (1), 36–41. >> https://doi.org/10.1021/ci00049a008. >> >> Others quickly picked up on it, because 1) it was easy to do - Willett >> told me that one of the first external implementation took an afternoon to >> implement, and 2) bitstrings were already present because everyone already >> had pre-computed MACCS keys. >> >> The choice of Tanimoto was based on a comparison of several different >> schemes, in: >> >> Willett, P.; Winterman, V. A Comparison of Some Measures for the >> Determination of Inter-Molecular Structural Similarity Measures of >> Inter-Molecular Structural Similarity. Quant. Struct.-Act. Relat. 1986, 5 >> (1), 18–25. https://doi.org/10.1002/qsar.19860050105. >> >> (That's the one which should property be quoted as demonstrating that the >> Tanimoto was at least as effective as the others, and easiest to implement, >> so was chosen. The two papers were jointly published, and reference each >> other "in press".) >> >> However, you'll notice that neither paper cites Tanimoto. Instead, they >> cite earlier work by Adamson and Bush. These are: >> >> Adamson, G. W.; Bush, J. A. A Method for the Automatic Classification of >> Chemical Structures. Information Storage and Retrieval 1973, 9 (10), >> 561–568. https://doi.org/10.1016/0020-0271(73)90059-4. >> >> Adamson, G. W.; Bush, J. A. A Comparison of the Performance of Some >> Similarity and Dissimilarity Measures in the Automatic Classification of >> Chemical Structures. Journal of Chemical Information and Computer Sciences >> 1975, 15 (1), 55–58. https://doi.org/10.1021/ci60001a016. >> >> The 1975 paper cites David J. Rogers, Taffee T. Tanimoto, A Computer >> Program for Classifying Plants, Science, 21 Oct 1960 1115-1118. >> https://science.sciencemag.org/content/132/3434/1115 >> >> More specifically, it's on p56 of the paper, starting on the last >> sentence of the first column, going to the top of the second column: >> >> Several coefficients have been proposed based on this criterion, >> [8-10,14-16] and some of these were used in the classification of the >> anesthetics." >> >> The 1973 paper neither cites Tanimoto nor uses a Tanimoto similarity. >> >> So it appears that Adamson et al. investigated bitstrings using other >> comparison methods, while Willett et al. were the first to investigate the >> Tanimoto. >> >> For several years after Willett et al. there is few/no citations to >> Tanimoto (1958) or to Rogers and Tanimoto (1960). As an example of one of >> the indirect citations, see: >> >> Grethe, G.; Moock, T. E. Similarity Searching in REACCS. A New Tool for >> the Synthetic Chemist. J. Chem. Inf. Comput. Sci. 1990, 30 (4), 511–520. >> https://doi.org/10.1021/ci00068a025. >> >> where the Tanimoto is citation (9): "Ref 1; p54", where reference 1 from >> the same paper is: Willett, P. Similarity and Clustering in Chemical >> Information Systems; Reseach Studies Press: Letchworth, Herfordshire, >> England, 1987 >> >> >> Now, go back to the citation that's currently in the spreadsheet: >> Tanimoto, Taffee T. (17 Nov 1958). >> "An Elementary Mathematical theory of Classification and Prediction". >> Internal IBM Technical Report. 1957 (8?). >> >> What does "*Internal* IBM Technical Report" mean?! >> >> Wikipedia used to describe this as "unavailable". I pointed out that it >> is available through worldcat, and I got a copy from SUB Göttingen : >> >> https://en.wikipedia.org/w/index.php?title=Jaccard_index&diff=704793261&oldid=688763411 >> >> It's at http://dalkescientific.com/tanimoto.pdf for the really curious. >> >> I can't figure out why anyone would refer a student to 1) an internal >> publication, where 2) it's so hard to get, especially given 3) the actual >> cheminformatics literature references a 1960 Science publication which is a >> further refinement of the internal report. >> >> My guess is that it's one of those citations that everyone passes around, >> but which no one has actually read. >> >> (If Tanimoto's 1958 internal report is a good citation, then I have a >> copy of the internal National Bureau of Standards publication by Ray and >> Kirsch from 1956, which predates their widely-cited 1957 Science >> publication: >> >> Ray, Louis and Russell A. Kirsch. The Use of Automatic Data Processing >> Systems in the Retrieval of Technical Information; National Bureau of >> Standards Report 5115, 1956 >> >> I had to get that from a used book dealer.) >> >> >> But wait, I'm not done yet. >> >> The Tanimoto we use is the same as the Jaccard similarity, so perhaps we >> should point students to that instead? >> >> The citation is: >> >> Jaccard, Paul. "Étude comparative de la distribution florale dans une >> portion des Alpes et des Jura." Bull Soc Vaudoise Sci Nat 37 (1901): >> 547-579. >> >> However, are students supposed to know French to read it? Or should we >> point to the English translation at: >> >> THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE. >> Paul Jaccard >> First published: February 1912 >> https://doi.org/10.1111/j.1469-8137.1912.tb05611.x >> >> https://nph.onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x >> >> (The cheminformatics literature has at least one paper which cites the >> original French, and at least one paper which cites the English >> translation.) >> >> In any case, there's really no connection between those papers and >> cheminformatics, other than for those interested in tracing the concept. >> >> That's why I think the Willett et al. paper(s) are all that a student >> really needs to read for the relevant history. While someone like me would >> like to read/document the more complete history. >> >> >> Andrew >> da...@dalkescientific.com >> >> >> >> >> _______________________________________________ >> Blueobelisk-discuss mailing list >> Blueobelisk-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss >> >
_______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss