On Dec 17, 2016, at 1:45 AM, Milinda Samaraweera wrote:
> However at the end of each tag header I noticed there is a number (bolded):
> 
> ...
> >  <Name_IUPAC_CAS>  (1) 
> N1-(2-ethylbutyl)hexane-1,3,6-triamine
   ...
> What is this number and how you avoid printing this number when SDwriter is 
> used? As this number is not found in standard SD files.

Many programs do not generate a term in parentheses, although it it allowed by 
the connection table specification as a way to designate an "external registry 
number".

The ctfile.pdf I have from 2011 says:

   • Note: The > sign is a reserved character. A field name cannot contain 
hyphen (-),
     period (.), less than (<), greater than (>), equal sign (=), percent sign 
(%) or
     blank space ( ). Field names must begin with an alpha character and can 
contain
     alpha and numeric characters after that, including underscore.

     Optional information for the data header includes:
        • The compound’s external and internal registry numbers.
          External registry numbers must be enclosed in parentheses.

        • Any combination of information

   The following are examples of valid data headers:
      > <MELTING_POINT>
      > 55     (MD-08974)     <BOILING_POINT>   DT12
      > DT12   55
      > (MD-0894)   <BOILING_POINT>   FROM ARCHIVES

As you have discovered, RDKit stores the output record number for each molecule 
in this field.

I see no way to disable that through the API. The two options I can suggest for 
now are:

1) Implement your own writer using MolToMolBlock() to generate the connection 
table text and your own code to enumerate through the properties. The result 
looks something like:

def mol_to_sd_block(mol):
    block = Chem.MolToMolBlock(mol)
    lines = [block]
    for name in mol.GetPropNames():
        lines.append("> <%s>\n%s\n\n" % (name, mol.GetProp(name)))
    lines.append("$$$$\n")
    return "".join(lines)


2) Maintain your own fork where you've deleted like 87 or so of 
Code/GraphMol/FileParsers/SDWriter.cpp where it says:

  if (d_molid >= 0) (*dp_ostream) << "(" << d_molid + 1 << ") ";


Cheers,


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to