Excellent!! Thank you again. I'll try to set up the branch this weekend.
Karl On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard < [email protected]> wrote: > Hi Karl, > > Sure thing, I created a ticket : https://issues.apache.org/ > jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in > attachment. > No specific libraries used, just JSOUP library that is already in the MCF > core project. > > Best regards, > > Olivier > > > > Le 15 mars 2018 à 11:51, Karl Wright <[email protected]> a écrit : > > > > Hi Oliver, > > > > Thank you very much for your contribution! > > > > To have a legal trail, I usually prefer the following approach -- > > > > (1) Create a ticket > > (2) Attach a diff to the ticket > > > > We'll then integrate the diff into a branch, and then finally into trunk. > > > > Can you also let us know what kinds of dependent jars the contribution > > has? We'd need to know about not only direct dependencies, but also any > > downstream dependencies that may be incompatible with the Apache License. > > Usually we can figure this out but it saves time to know in advance if > > there are LGPL dependencies (for instance). > > > > Karl > > > > > > On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard < > > [email protected]> wrote: > > > >> Hello MCF community, > >> > >> I developed a transformation connector based on Jsoup. The goal of this > >> code id to simply choose an encompassing tag in a HTML document for text > >> extracting. And inside this tag, this connector allows you to remove > >> subparts that you do no want : all the tags corresponding to declared > types > >> or specific attribute tag names for example. > >> I would like to know if it could interest you. The code is in Apache V2 > >> licence and I integrated it in our enterprise search solution > (Datafari). > >> This morning I integrated the code in a fork MCF project on GitHub. > >> Obviously it needs some work including code refactoring, renaming > classes, > >> unit tests that I will be able to do if you are interested by the code. > >> The code is here : https://github.com/otavard/manifoldcf/tree/ > >> htmlextractorconnector <https://github.com/otavard/manifoldcf/commits/ > >> htmlextractorconnector> > >> And the documentation here : https://datafari.atlassian. > >> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+ > >> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/ > >> pages/237240321/HTML+Extractor+Transformation+connector> > >> > >> Best regards, > >> > >> Olivier TAVARD > >> > >> > >> > >
