Regarding Tanimoto, credit should be given to the original source, whatever
language it's written in. From memory, la distribution des flores dans la
zone alpine, I believe.

On Wed, 4 Sep 2019, 20:21 Andrew Dalke, <da...@dalkescientific.com> wrote:

> Here's another example of how it's important to know the clear goal of
> collecting such a list.
>
> One of the entries someone added to the spreadsheet is:
>   Tanimoto, Taffee T. (17 Nov 1958).
>   "An Elementary Mathematical theory of Classification and Prediction".
>   Internal IBM Technical Report. 1957 (8?).
>
> I'm going to argue that it's not useful.
>
>
> This is likely present because it's the reason why we call Tanimoto
> "Tanimoto".
>
> However, I don't think it's worthwhile to point to that citation. Here's
> the history as I know it:
>
> The first paper to really use the Tanimoto for similarity search was:
>
> Willett, P.; Winterman, V.; Bawden, D. Implementation of Nearest-Neighbor
> Searching in an Online Chemical Structure Search System. Journal of
> Chemical Information and Computer Sciences 1986, 26 (1), 36–41.
> https://doi.org/10.1021/ci00049a008.
>
> Others quickly picked up on it, because 1) it was easy to do - Willett
> told me that one of the first external implementation took an afternoon to
> implement, and 2) bitstrings were already present because everyone already
> had pre-computed MACCS keys.
>
> The choice of Tanimoto was based on a comparison of several different
> schemes, in:
>
> Willett, P.; Winterman, V. A Comparison of Some Measures for the
> Determination of Inter-Molecular Structural Similarity Measures of
> Inter-Molecular Structural Similarity. Quant. Struct.-Act. Relat. 1986, 5
> (1), 18–25. https://doi.org/10.1002/qsar.19860050105.
>
> (That's the one which should property be quoted as demonstrating that the
> Tanimoto was at least as effective as the others, and easiest to implement,
> so was chosen. The two papers were jointly published, and reference each
> other "in press".)
>
> However, you'll notice that neither paper cites Tanimoto. Instead, they
> cite earlier work by Adamson and Bush. These are:
>
> Adamson, G. W.; Bush, J. A. A Method for the Automatic Classification of
> Chemical Structures. Information Storage and Retrieval 1973, 9 (10),
> 561–568. https://doi.org/10.1016/0020-0271(73)90059-4.
>
> Adamson, G. W.; Bush, J. A. A Comparison of the Performance of Some
> Similarity and Dissimilarity Measures in the Automatic Classification of
> Chemical Structures. Journal of Chemical Information and Computer Sciences
> 1975, 15 (1), 55–58. https://doi.org/10.1021/ci60001a016.
>
> The 1975 paper cites David J. Rogers, Taffee T. Tanimoto, A Computer
> Program for Classifying Plants, Science, 21 Oct 1960 1115-1118.
> https://science.sciencemag.org/content/132/3434/1115
>
> More specifically, it's on p56 of the paper, starting on the last sentence
> of the first column, going to the top of the second column:
>
>    Several coefficients have been proposed based on this criterion,
> [8-10,14-16] and some of these were used in the classification of the
> anesthetics."
>
> The 1973 paper neither cites Tanimoto nor uses a Tanimoto similarity.
>
> So it appears that Adamson et al. investigated bitstrings using other
> comparison methods, while Willett et al. were the first to investigate the
> Tanimoto.
>
> For several years after Willett et al. there is few/no citations to
> Tanimoto (1958) or to Rogers and Tanimoto (1960). As an example of one of
> the indirect citations, see:
>
> Grethe, G.; Moock, T. E. Similarity Searching in REACCS. A New Tool for
> the Synthetic Chemist. J. Chem. Inf. Comput. Sci. 1990, 30 (4), 511–520.
> https://doi.org/10.1021/ci00068a025.
>
> where the Tanimoto is citation (9): "Ref 1; p54", where reference 1 from
> the same paper is: Willett, P. Similarity and Clustering in Chemical
> Information Systems; Reseach Studies Press: Letchworth, Herfordshire,
> England, 1987
>
>
> Now, go back to the citation that's currently in the spreadsheet:
>   Tanimoto, Taffee T. (17 Nov 1958).
>   "An Elementary Mathematical theory of Classification and Prediction".
>   Internal IBM Technical Report. 1957 (8?).
>
> What does "*Internal* IBM Technical Report" mean?!
>
> Wikipedia used to describe this as "unavailable". I pointed out that it is
> available through worldcat, and I got a copy from SUB Göttingen :
>
> https://en.wikipedia.org/w/index.php?title=Jaccard_index&diff=704793261&oldid=688763411
>
> It's at http://dalkescientific.com/tanimoto.pdf for the really curious.
>
> I can't figure out why anyone would refer a student to 1) an internal
> publication, where 2) it's so hard to get, especially given 3) the actual
> cheminformatics literature references a 1960 Science publication which is a
> further refinement of the internal report.
>
> My guess is that it's one of those citations that everyone passes around,
> but which no one has actually read.
>
> (If Tanimoto's 1958 internal report is a good citation, then I have a copy
> of the internal National Bureau of Standards publication by Ray and Kirsch
> from 1956, which predates their widely-cited 1957 Science publication:
>
> Ray, Louis and Russell A. Kirsch. The Use of Automatic Data Processing
> Systems in the Retrieval of Technical Information; National Bureau of
> Standards Report 5115, 1956
>
> I had to get that from a used book dealer.)
>
>
> But wait, I'm not done yet.
>
> The Tanimoto we use is the same as the Jaccard similarity, so perhaps we
> should point students to that instead?
>
> The citation is:
>
>   Jaccard, Paul. "Étude comparative de la distribution florale dans une
> portion des Alpes et des Jura." Bull Soc Vaudoise Sci Nat 37 (1901):
> 547-579.
>
> However, are students supposed to know French to read it? Or should we
> point to the English translation at:
>
> THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.
> Paul Jaccard
> First published: February 1912
> https://doi.org/10.1111/j.1469-8137.1912.tb05611.x
>
> https://nph.onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x
>
> (The cheminformatics literature has at least one paper which cites the
> original French, and at least one paper which cites the English
> translation.)
>
> In any case, there's really no connection between those papers and
> cheminformatics, other than for those interested in tracing the concept.
>
> That's why I think the Willett et al. paper(s) are all that a student
> really needs to read for the relevant history. While someone like me would
> like to read/document the more complete history.
>
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> _______________________________________________
> Blueobelisk-discuss mailing list
> Blueobelisk-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss
>
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to