Re: [Rdkit-discuss] Extracting SMILES from text

Andrew Dalke Sat, 03 Dec 2016 10:30:06 -0800

On Dec 3, 2016, at 3:02 PM, Brian Kelley wrote:
> If I had to pick, I would just use the normal MolFromSmiles, if you don't 
> expect many actual smiles strings in your corpus, it's plenty fast.


I didn't follow from your timings what you used to see if something was a 
SMILES candidate?

Was it word splitting, or was it my regex? Would it detect the SMILES in my 
examples:

   The combination of phenol (c1ccccc1O) and ....
   The SMILES for phenol is c1ccccc1O.

Precision and recall can, of course, be more important than performance.

Anyone want to take on the boring job of developing a corpus and putting 
together a benchmark? It certainly isn't going to be me. :) Or perhaps it 
already exits?



                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Extracting SMILES from text

Reply via email to