[ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059970#comment-16059970 ]
Karl Wright edited comment on CONNECTORS-1433 at 6/22/17 9:54 PM: ------------------------------------------------------------------ In this scenario below, what would the equivalent of the "my_data" field name be in the ES connector? PUT my_index/my_type/my_id?pipeline=attachment {code} { "*+my_data+*": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" } GET my_index/my_type/my_id { "found": true, "_index": "my_index", "_type": "my_type", "_id": "my_id", "_version": 1, "_source": { "*+my_data+*": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=", "attachment": { "content_type": "application/rtf", "language": "ro", "content": "Lorem ipsum dolor sit amet", "content_length": 28 } } } {code} When it PUTs the document, what is the field name? I have this from ElasticSearchIndex.java (line 202): {code} // Since ES 1.0 pw.print(" \"_content\" : \""); Base64 base64 = new Base64(); base64.encodeStream(inputStream, pw); pw.print("\"}"); {code} so I assumed it was the _content field, but that doesn't work in the pipeline. I'll investigate further. was (Author: svanschalkwyk): In this scenario below, what would the equivalent of the "my_data" field name be in the ES connector? PUT my_index/my_type/my_id?pipeline=attachment { "*+my_data+*": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=" } GET my_index/my_type/my_id { "found": true, "_index": "my_index", "_type": "my_type", "_id": "my_id", "_version": 1, "_source": { "*+my_data+*": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=", "attachment": { "content_type": "application/rtf", "language": "ro", "content": "Lorem ipsum dolor sit amet", "content_length": 28 } } } When it PUTs the document, what is the field name? I have this from ElasticSearchIndex.java (line 202): // Since ES 1.0 pw.print(" \"_content\" : \""); Base64 base64 = new Base64(); base64.encodeStream(inputStream, pw); pw.print("\"}"); so I assumed it was the _content field, but that doesn't work in the pipeline. I'll investigate further. > Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not > BASE64 > ------------------------------------------------------------------------------- > > Key: CONNECTORS-1433 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1433 > Project: ManifoldCF > Issue Type: Wish > Components: Tika extractor > Reporter: Steph van Schalkwyk > Assignee: Karl Wright > Attachments: CONNECTORS-1433.patch, image.png, image.png > > > Would love to have Tika spout TEXT, not BASE64. -- This message was sent by Atlassian JIRA (v6.4.14#64029)