First the thing I always have to say:
According to the spec for mol blocks, aromatic bond orders are only
supposed to be used for queries.

Given the number of bogus mol files out there in the wild, the RDKit does
actually still read these:

In [49]: print(mb)

     RDKit          2D

  6  6  0  0  0  0  0  0  0  0999 V2000
    1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500   -1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7500    1.2990    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0
  2  3  4  0
  3  4  4  0
  4  5  4  0
  5  6  4  0
  6  1  4  0
M  END


In [50]: nm = Chem.MolFromMolBlock(mb)

In [51]: Chem.MolToSmiles(nm)
Out[51]: 'c1ccccc1'


It sounds like the problem you are having is analogous to this one:

In [55]: print(mb)

     RDKit

  5  5  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0
  2  3  4  0
  3  4  4  0
  4  5  4  0
  5  1  4  0
M  END


In [56]: nm = Chem.MolFromMolBlock(mb)
[04:56:04] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4


This is the same problem that the RDKit has processing the (bogus) SMILES
'c1cccn1' for pyrrole: the missing H specification causes problems. Same
thing with the (again bogus) SMILES for tetrazole that you provide.
There is no code in the RDKit to try and guess what the user means with
these poorly specified molecules.
There have been discussions about this in the past on the mailing list and
there are some links to those (but, strangely, no code) in the cookbook:
http://www.rdkit.org/docs/Cookbook.html#cleaning-up-heterocycles
That's probably a good place to start.

-greg






On Thu, Dec 8, 2016 at 5:36 PM, Brian Cole <col...@gmail.com> wrote:

> Any advice on getting RDKit to read in SDF files that use bond order '4'
> to mark bonds as aromatic and don't have explicit hydrogen? For example,
> imagine two fused heterocycles where the hydrogen isn't really known. I
> have SDF files that just mark the bond orders as '4', aromatic, and don't
> even try to specify which tautomer it wants to represent.
>
> Does this enter the same category as OpenBabel considering c1nnnn1 to be
> tetrazole and not specifying where the hydrogen is?
>
> Any tips for getting RDKit to input these structures and clean them up?
>
> Thanks,
> Brian
>
> ------------------------------------------------------------
> ------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today.http://sdm.link/xeonphi
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to