Hi,

I wanted to know if the code remains interesting for the MCF community.
I updated it since the initial release so please tell me if I need to submit a 
new patch into the issue already created : 
https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500 
<https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500>

Thanks,
Best regards,

Olivier TAVARD


> Le 15 mars 2018 à 15:58, Karl Wright <[email protected]> a écrit :
> 
> Excellent!!
> 
> Thank you again.  I'll try to set up the branch this weekend.
> 
> Karl
> 
> 
> On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
> [email protected]> wrote:
> 
>> Hi Karl,
>> 
>> Sure thing, I created a ticket : https://issues.apache.org/
>> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
>> attachment.
>> No specific libraries used, just JSOUP library that is already in the MCF
>> core project.
>> 
>> Best regards,
>> 
>> Olivier
>> 
>> 
>>> Le 15 mars 2018 à 11:51, Karl Wright <[email protected]> a écrit :
>>> 
>>> Hi Oliver,
>>> 
>>> Thank you very much for your contribution!
>>> 
>>> To have a legal trail, I usually prefer the following approach --
>>> 
>>> (1) Create a ticket
>>> (2) Attach a diff to the ticket
>>> 
>>> We'll then integrate the diff into a branch, and then finally into trunk.
>>> 
>>> Can you also let us know what kinds of dependent jars the contribution
>>> has?  We'd need to know about not only direct dependencies, but also any
>>> downstream dependencies that may be incompatible with the Apache License.
>>> Usually we can figure this out but it saves time to know in advance if
>>> there are LGPL dependencies (for instance).
>>> 
>>> Karl
>>> 
>>> 
>>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
>>> [email protected]> wrote:
>>> 
>>>> Hello MCF community,
>>>> 
>>>> I developed a transformation connector based on Jsoup. The goal of this
>>>> code id to simply choose an encompassing tag in a HTML document for text
>>>> extracting. And inside this tag, this connector allows you to remove
>>>> subparts that you do no want : all the tags corresponding to declared
>> types
>>>> or specific attribute tag names for example.
>>>> I would like to know if it could interest you. The code is in Apache V2
>>>> licence  and I integrated it in our enterprise search solution
>> (Datafari).
>>>> This morning I integrated the code in a fork MCF project on GitHub.
>>>> Obviously it needs some work including code refactoring, renaming
>> classes,
>>>> unit tests that I will be able to do if you are interested by the code.
>>>> The code is here : https://github.com/otavard/manifoldcf/tree/
>>>> htmlextractorconnector <https://github.com/otavard/manifoldcf/commits/
>>>> htmlextractorconnector>
>>>> And the documentation here : https://datafari.atlassian.
>>>> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
>>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
>>>> pages/237240321/HTML+Extractor+Transformation+connector>
>>>> 
>>>> Best regards,
>>>> 
>>>> Olivier TAVARD
>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to