Hi, I wanted to know if the code remains interesting for the MCF community. I updated it since the initial release so please tell me if I need to submit a new patch into the issue already created : https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500 <https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500>
Thanks, Best regards, Olivier TAVARD > Le 15 mars 2018 à 15:58, Karl Wright <[email protected]> a écrit : > > Excellent!! > > Thank you again. I'll try to set up the branch this weekend. > > Karl > > > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard < > [email protected]> wrote: > >> Hi Karl, >> >> Sure thing, I created a ticket : https://issues.apache.org/ >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in >> attachment. >> No specific libraries used, just JSOUP library that is already in the MCF >> core project. >> >> Best regards, >> >> Olivier >> >> >>> Le 15 mars 2018 à 11:51, Karl Wright <[email protected]> a écrit : >>> >>> Hi Oliver, >>> >>> Thank you very much for your contribution! >>> >>> To have a legal trail, I usually prefer the following approach -- >>> >>> (1) Create a ticket >>> (2) Attach a diff to the ticket >>> >>> We'll then integrate the diff into a branch, and then finally into trunk. >>> >>> Can you also let us know what kinds of dependent jars the contribution >>> has? We'd need to know about not only direct dependencies, but also any >>> downstream dependencies that may be incompatible with the Apache License. >>> Usually we can figure this out but it saves time to know in advance if >>> there are LGPL dependencies (for instance). >>> >>> Karl >>> >>> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard < >>> [email protected]> wrote: >>> >>>> Hello MCF community, >>>> >>>> I developed a transformation connector based on Jsoup. The goal of this >>>> code id to simply choose an encompassing tag in a HTML document for text >>>> extracting. And inside this tag, this connector allows you to remove >>>> subparts that you do no want : all the tags corresponding to declared >> types >>>> or specific attribute tag names for example. >>>> I would like to know if it could interest you. The code is in Apache V2 >>>> licence and I integrated it in our enterprise search solution >> (Datafari). >>>> This morning I integrated the code in a fork MCF project on GitHub. >>>> Obviously it needs some work including code refactoring, renaming >> classes, >>>> unit tests that I will be able to do if you are interested by the code. >>>> The code is here : https://github.com/otavard/manifoldcf/tree/ >>>> htmlextractorconnector <https://github.com/otavard/manifoldcf/commits/ >>>> htmlextractorconnector> >>>> And the documentation here : https://datafari.atlassian. >>>> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+ >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/ >>>> pages/237240321/HTML+Extractor+Transformation+connector> >>>> >>>> Best regards, >>>> >>>> Olivier TAVARD >>>> >>>> >>>> >> >>
