[
https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057881#comment-16057881
]
Karl Wright commented on CONNECTORS-1433:
-----------------------------------------
I've never been clear on whether the ES connector is using the mapper
attachment correctly or not. The content is binary (not text) and ES doesn't
do its own Tika extraction of the binary, so I can see why this might be
difficult. But an assumed ability to convert directly to text isn't going to
work either because we do primarily output binary content.
The big question is what it a better way to view this problem?
(1) If ES can only accept *text* output, then we should reject all content that
isn't text, and we should *not* convert to base64. That would force people
generally to use the Tika transformer with the ES output connector.
(2) If the mapper attachment can do some kinds of conversions, and it can
convert base64 back to characters, then we can leave things as they are.
Please advise.
> Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not
> BASE64
> -------------------------------------------------------------------------------
>
> Key: CONNECTORS-1433
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1433
> Project: ManifoldCF
> Issue Type: Wish
> Components: Tika extractor
> Reporter: Steph van Schalkwyk
> Assignee: Karl Wright
>
> Would love to have Tika spout TEXT, not BASE64.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)