[
https://issues.apache.org/jira/browse/CONNECTORS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694406#comment-16694406
]
Karl Wright commented on CONNECTORS-1557:
-----------------------------------------
The best way to deliver the code is as a patch attachment to a ticket like this.
I hope that the transformer you wrote is consistent with the other transformers
that ManifoldCF provides, e.g. the HTML Extractor and the Metadata Adjuster.
Generally we are not fond of transformers that take on more than the most basic
part of what might be structured as a multi-part transformation. From your
description it sounds like you've basically extended the HTML extractor and
added functionality to it similar to what the Metadata Adjuster does. If
that's true, it might be good to only provide the extraction functionality
extension from CSS to the HTML extractor, and let the Metadata Adjuster handle
the field mappings.
Please let me know how you want to proceed.
> HTML Tag extractor
> ------------------
>
> Key: CONNECTORS-1557
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1557
> Project: ManifoldCF
> Issue Type: New Feature
> Reporter: Donald Van den Driessche
> Priority: Major
>
> I wrote a HTML Tag extractor, based on the HTML Extractor.
> I needed to extract specific HTML tags and transfer them to their own field
> in my output repository.
> Input
> * Englobing tag (CSS selector)
> * Blacklist (CSS selector)
> * Fieldmapping (CSS selector)
> * Strip HTML
> Process
> * Retrieve Englobing tag
> * Remove blacklist
> * Map selected CSS selectors in Fieldmapping (arrays if multiple finds) +
> strip HTML (if requested)
> * Englobing tag minus blacklist: strip HTML (if requested) and return as
> output (content)
> How can I best deliver the source code?
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)