On Wed, May 17, 2017 at 2:41 AM, John Mayfield <john.wilkinson...@gmail.com>
wrote:

> I think I agree but need to draw out the digraph to convince my self. The
> whole reason for 1b was to fix this case (I believe originally from WDI
> hashcode paper IIRC):
>
> CC(C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2
>
> I think splitting ties when they're the same is undesirable but worse is
> naming two different things the same. As I said you can fix the first one
> with a different and better algorithm.
>

? Missing this reference. Better algorithm than what? Or you mean just in
general, if you get a null result, at least you are just missing something.
Either case, I think, you need a better algorithm. :)


> For the second you have these examples which are different but get the
> same R/S labels:
>
> C[C@H]1[C@@H](C)[C@@H](C)[C@@H](C)[C@H](C)[C@H](C)[C@H](C)[C@@H]1C
> C[C@H]1[C@H](C)[C@@H](C)[C@H](C)[C@H](C)[C@H](C)[C@@H](C)[C@H]1C
>

Oh, that is very cool. So you think this is a failure of Rule 4b in the
IUPAC rules? Very impressive. I don't think Jmol is making any mistake
here, do you?


> Correct the CDK is rudimentary and only handles simple cases - although in
> practise that is most cases :-). But as I've said multiple times the
> Centres (https://github.com/johnmay/centres) one as a more complete
> implementation which does 1-5 and Aux descriptions. It's a little difficult
> to follow as I wrote it toolkit independant so downstream users plug in to
> certain interfaces saying how to access atomic number and connected atoms
> etc. The parts are spread out a bit but you can see how the rules are
> configured here: https://github.com/johnmay/centres/blob/develop/
> cdk/src/main/java/uk/ac/ebi/centres/cdk/CDKPerceptor.java#L78-L106
>
>
Thanks for that link. You will find that when you implement 1a fully, you
will need to pull it apart from 1b, applying Rule 1a exhaustively before
1b. Otherwise it messes up. Pretty sure that is true with 4a, 4b, and 4c as
well. I guess that's obvious to  you; took me a while to catch on to that.
Is Centres doing that? 4a and 4b are both in PairRule, right?

I think it's interesting that the Kekule consideration introduces the
situation that a duplicated atom can break a tie either in its own sphere
(due to its mass or its root distance) or in its substituents' sphere (do
to its massless phantom atoms). So the idea of a "simplified digraph" that
hides the phantom atoms and doesn't indicate duplicated atom mass or root
distance is difficult to interpret -- you have to remember to apply the
atom mass exhaustively first -- possibly moving its priority *above *its
corresponding nonduplicated atom, then root distance, then, in the next
sphere, its phantom atom masses. Very tricky.

>
> I never spent enough time on it to do the fractional bond orders, I was
> considering doing it for this ACS talk but we'll see - it's a lot of effort
> for very little gain.
>

It was only about 50 lines, actually, at least since all I did was for the
important cases (6-membered rings). Feel free to utilize it, of course. You
will need it anyway for the 1b correction. Or at least, for that you will
need some sort of Kekule check. Maybe you already have that somewhere
else....

I really wish they had restricted Rule 1b to ring-type duplicated atoms
only. Alas!


> The validation of centres was done in my thesis but there are a handful of
> examples here: https://github.com/johnmay/centres/tree/develop/
> cdk/src/test/resources/uk/ac/ebi/centres/cdk. Again, I was planning on
> putting together a comprehensive validation set for the talk.
>
>
Great. I will incorporate those into my test suite. Are the target
designations in the files?
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to