On Fri, Jun 25, 2021 at 8:00 PM Oliver Stueker <oliver.stue...@mun.ca> wrote:
> > And Peter: > A big thumbs up for Python. It would really be great to port the > CML-validator and later even the JUMBO-converters to Python. It was a wise > decision to encode the "rules" in XML so they can be re-used when > re-implementing the tools in a different language. Maybe someday I'll find > time to contribute to those efforts. > > JUMBO-Converters tackles the impossible task of interpreting FORTRAN output without the source code. It can therefore only be a rough approximation. Most of the tools are not too bad - they output banners and whitespace lines. Gaussian is utterly appalling. There are many different dialects due to different authors . However it's possible to catch much of the output and its context. I write them as a nested lookahead parser, relying on end-of-sections. I would now rewrite in Python + a parser like ANTLR which Lezan and I used for NLP of phrases in ChemicalTagger. We probably need to have more heuristics for blocks of numbers and relate them to presumed atom counts, etc. I'm particularly interested in extracting the chemical properties rather than the atoms (where people with the source code have written sub-parsers) . But I don't yet know who would be interested. My main problem was finding anyone to use the results. I'm more interested in parsing diagrams from PNG, e.g reaction schemes. P. Cheers, > Oliver > > Oliver Stueker > Research Consultant, ACENET > A Compute Canada Regional Partner > > > On Fri, Jun 25, 2021 at 3:37 PM Noel O'Boyle <baoille...@gmail.com> wrote: > >> Just to note that importing from Bitbucket is handled by GitHub. Also >> that GitHub pages can serve HTML by turning off Jekyll (a process which I'd >> recommend). >> >> - Noel >> >> On Fri, 25 Jun 2021, 16:19 Peter Murray-Rust, <pm...@cam.ac.uk> wrote: >> >>> Oliver and colleagues did a really great job in porting CML code. >>> >>> It is valuable and will continue to be. Wikidata+CML adds a lot of >>> potential to semantic chemistry. >>> >>> We are still continuing to work on (a) plant science and (b) battery >>> materials , both of which will need a semantic framework. >>> >>> I have moved from Java to Python and it would be relatively easy to >>> migrate CML to Python. XML has a considerable overhead in Java which can be >>> bypassed in Python. So we have an XML paper-reader (AMI) and it would be >>> fun to add chemistry. But it needs users to help drive it. A circular >>> problem but one which gets continually easier to solve. >>> >>> P. >>> >>> >>> >>> On Fri, Jun 25, 2021 at 3:23 PM Egon Willighagen < >>> egon.willigha...@gmail.com> wrote: >>> >>>> >>>> On Fri, Jun 25, 2021 at 3:25 PM Oliver Stueker <oliver.stue...@mun.ca> >>>> wrote: >>>> >>>>> Back between Christmas 2019 and the early days of 2020, Mark and I >>>>> made good progress in enabling CI (Travis) on many of the projects. >>>>> At some point I had to stop because I needed to get access to publish >>>>> modules on Maven Central, which took a few days. Then 2020 happened. >>>>> >>>> >>>> Indeed. >>>> >>>> I haven't touched those repos in 18 months now >>>>> >>>> >>>> Yes, I saw that... time has passed by quickly... it's all such a blur >>>> >>>> >>>>> and I don't know when I'll take the time to continue working on them. >>>>> >>>> >>>> This is what we have the Dr. Who model for :) >>>> >>>> >>>>> I still have the plan to get the jumbo-converters and all their >>>>> dependencies set-up for CI and published to Maven Central, but I don't >>>>> have >>>>> a timeline, as for me this is just a hobby-project. >>>>> >>>> >>>> Understood! (same here) >>>> >>>> >>>>> At some point I'd even like to work on porting the www.xml-cml.org >>>>> website to use Jekyll and be hosted from a GitHub-repo, so that we can >>>>> expand the dictionaries with Pull-requests. >>>>> >>>> >>>> Yes, that sounds good. >>>> >>>> >>>>> But if you need CMLXOM or other parts to be published in Maven >>>>> Central, I'd say go ahead. >>>>> >>>> >>>> Okay, going ahead and will use org.blueoblelisk as Maven groupId >>>> >>>> Egon >>>> >>>> >>>> >>>>> >>>>> Cheers, >>>>> Oliver >>>>> >>>>> Oliver Stueker >>>>> Research Consultant, ACENET >>>>> 709.864.3021 | www.ace-net.ca | @computeatlantic >>>>> >>>>> A Compute Canada Regional Partner >>>>> >>>>> >>>>> On Fri, Jun 25, 2021 at 3:46 AM Egon Willighagen < >>>>> egon.willigha...@gmail.com> wrote: >>>>> >>>>>> >>>>>> Hi Oliver, Mark, all >>>>>> >>>>>> On Fri, Dec 13, 2019 at 9:01 PM Oliver Stueker <oliver.stue...@mun.ca> >>>>>> wrote: >>>>>> >>>>>>> We would like to get your feedback on a number of points: >>>>>>> >>>>>> >>>>>>> - Do you also think that https://github.com/BlueObelisk would be >>>>>>> a good home for the WWMM and CML repositories? The names wwmm and >>>>>>> cml are >>>>>>> already taken on GitHub.com. >>>>>>> >>>>>>> I think it worked out quite well. >>>>>> >>>>>>> >>>>>>> - What are your thoughts? >>>>>>> >>>>>>> What are people's current plans to make modules available on Maven >>>>>> Central? For example, I just discovered the CMLXOM needs updates and >>>>>> needs >>>>>> to be uploaded to Maven Central. >>>>>> >>>>>> - Has anyone already started this? >>>>>> - Should we use io.github.blueobelisk as (new) group ID? >>>>>> >>>>>> I have a Sonatype account and can set something up, but want to make >>>>>> sure no one else is already planning this. >>>>>> >>>>>> Egon >>>>>> >>>>>> -- >>>>>> This year I am stepping down as co-Editor-in-Chief of the Journal of >>>>>> Cheminformatics, because of a conflict of interest with Springer Nature. >>>>>> See https://twitter.com/egonwillighagen/status/1403299501947899907 >>>>>> >>>>>> ----- >>>>>> E.L. Willighagen >>>>>> Department of Bioinformatics - BiGCaT >>>>>> Maastricht University (http://www.bigcat.unimaas.nl/) >>>>>> Twitter/Mastodon: @egonwillighagen >>>>>> <https://twitter.com/egonwillighagen> / @egonw >>>>>> <https://scholar.social/@egonw> >>>>>> Homepage: http://egonw.github.com/ >>>>>> Blog: http://chem-bla-ics.blogspot.com/ >>>>>> PubList: https://www.zotero.org/egonw >>>>>> ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286> >>>>>> ImpactStory: https://impactstory.org/u/egonwillighagen >>>>>> >>>>> >>>> >>>> -- >>>> This year I am stepping down as co-Editor-in-Chief of the Journal of >>>> Cheminformatics, because of a conflict of interest with Springer Nature. >>>> See https://twitter.com/egonwillighagen/status/1403299501947899907 >>>> >>>> ----- >>>> E.L. Willighagen >>>> Department of Bioinformatics - BiGCaT >>>> Maastricht University (http://www.bigcat.unimaas.nl/) >>>> Twitter/Mastodon: @egonwillighagen >>>> <https://twitter.com/egonwillighagen> / @egonw >>>> <https://scholar.social/@egonw> >>>> Homepage: http://egonw.github.com/ >>>> Blog: http://chem-bla-ics.blogspot.com/ >>>> PubList: https://www.zotero.org/egonw >>>> ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286> >>>> ImpactStory: https://impactstory.org/u/egonwillighagen >>>> >>> >>> >>> -- >>> "I always retain copyright in my papers, and nothing in any contract I >>> sign with any publisher will override that fact. You should do the same". >>> >>> Peter Murray-Rust >>> Reader Emeritus in Molecular Informatics >>> Yusuf Hamied Department of Chemistry >>> University of Cambridge >>> CB2 1EW, UK >>> +44-1223-336432 >>> _______________________________________________ >>> Blueobelisk-discuss mailing list >>> Blueobelisk-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss >>> >> -- "I always retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Yusuf Hamied Department of Chemistry University of Cambridge CB2 1EW, UK +44-1223-336432
_______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss