Hi Olivier, This was actually already committed. But it was renamed as the html-extractor connector, not "datafari", which didn't mean anything to me.
Any changes you want to make should therefore be supplied as a diff against the html-extractor connector. Sorry for the confusion!! Karl On Fri, May 4, 2018 at 4:28 PM Karl Wright <[email protected]> wrote: > Yes, please do update the patch. I'm sorry I did not get to this; many > other things intruded. I created the branch but did not apply the original > patch onto it, so please supply a whole new patch. > > Karl > > > On Fri, May 4, 2018 at 11:28 AM Olivier Tavard < > [email protected]> wrote: > >> Hi, >> >> I wanted to know if the code remains interesting for the MCF community. >> I updated it since the initial release so please tell me if I need to >> submit a new patch into the issue already created : >> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500 >> < >> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500 >> > >> >> Thanks, >> Best regards, >> >> Olivier TAVARD >> >> >> > Le 15 mars 2018 à 15:58, Karl Wright <[email protected]> a écrit : >> > >> > Excellent!! >> > >> > Thank you again. I'll try to set up the branch this weekend. >> > >> > Karl >> > >> > >> > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard < >> > [email protected]> wrote: >> > >> >> Hi Karl, >> >> >> >> Sure thing, I created a ticket : https://issues.apache.org/ >> >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in >> >> attachment. >> >> No specific libraries used, just JSOUP library that is already in the >> MCF >> >> core project. >> >> >> >> Best regards, >> >> >> >> Olivier >> >> >> >> >> >>> Le 15 mars 2018 à 11:51, Karl Wright <[email protected]> a écrit : >> >>> >> >>> Hi Oliver, >> >>> >> >>> Thank you very much for your contribution! >> >>> >> >>> To have a legal trail, I usually prefer the following approach -- >> >>> >> >>> (1) Create a ticket >> >>> (2) Attach a diff to the ticket >> >>> >> >>> We'll then integrate the diff into a branch, and then finally into >> trunk. >> >>> >> >>> Can you also let us know what kinds of dependent jars the contribution >> >>> has? We'd need to know about not only direct dependencies, but also >> any >> >>> downstream dependencies that may be incompatible with the Apache >> License. >> >>> Usually we can figure this out but it saves time to know in advance if >> >>> there are LGPL dependencies (for instance). >> >>> >> >>> Karl >> >>> >> >>> >> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard < >> >>> [email protected]> wrote: >> >>> >> >>>> Hello MCF community, >> >>>> >> >>>> I developed a transformation connector based on Jsoup. The goal of >> this >> >>>> code id to simply choose an encompassing tag in a HTML document for >> text >> >>>> extracting. And inside this tag, this connector allows you to remove >> >>>> subparts that you do no want : all the tags corresponding to declared >> >> types >> >>>> or specific attribute tag names for example. >> >>>> I would like to know if it could interest you. The code is in Apache >> V2 >> >>>> licence and I integrated it in our enterprise search solution >> >> (Datafari). >> >>>> This morning I integrated the code in a fork MCF project on GitHub. >> >>>> Obviously it needs some work including code refactoring, renaming >> >> classes, >> >>>> unit tests that I will be able to do if you are interested by the >> code. >> >>>> The code is here : https://github.com/otavard/manifoldcf/tree/ >> >>>> htmlextractorconnector < >> https://github.com/otavard/manifoldcf/commits/ >> >>>> htmlextractorconnector> >> >>>> And the documentation here : https://datafari.atlassian. >> >>>> >> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+ >> >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/ >> >>>> pages/237240321/HTML+Extractor+Transformation+connector> >> >>>> >> >>>> Best regards, >> >>>> >> >>>> Olivier TAVARD >> >>>> >> >>>> >> >>>> >> >> >> >> >> >>
