I should also mention that I'm trying to use the RabbitMQ river, which is why I'm converting files into Base64 to begin with.
Thanks again! On Wednesday, November 19, 2014 2:22:23 PM UTC-5, [email protected] wrote: > > Bump on this. I've tried using the built in Java Base64 encoding: > Base64.getMimeEncoder().encode(Files.readAllBytes(file.toPath())); > > > And using jackson's ObjectMapper function as follows > String base64 = mapper.writeValueAsString(Files.readAllBytes(file.toPath > ())); > base64 = base64.substring(1, base64.length() - 1); > > > Can anyone help out out and point me where I'm going wrong? > > On Sunday, May 12, 2013 3:59:54 PM UTC-4, Massimiliano Perantoni wrote: >> >> Hi, >> I installed elasticsearch flawlessly and started developing a mail >> indexing solution. >> Dealing with the main setup everything went flawlessly, I even installed >> the plugin for tika document text extraction. >> After that I wrote some simple beans to write in the system some emails >> after parsing using java mail. >> When it comes to index attachments (docs, pdfs, docx, open documents, etc >> etc), several mails got indexed correctly, some others no. >> I had some problems in putting direct base64 encoded documents from the >> email, even because when it comes to encoding, I preferred to decode the >> contents and reencode it, just to be sure I wrote everything correctly. >> When I create the json file (attached to the email), I succeed even in >> creating the decoded document whici is readable and the payload I pass to >> elasticsearch is working. >> Here are the versions: >> >> elasticsearch versione 0.90.0 >> elasticsearch-mapper-attachments 1.7.0 >> >> See attached json as test document >> Here's the mapping I used >> curl -XGET 'http://localhost:9200/anagrafiche/email/_mapping?pretty=true' >> { >> "email" : { >> "properties" : { >> "addTimestamp" : { >> "type" : "string" >> }, >> "answered" : { >> "type" : "boolean" >> }, >> "attacheddocument" : { >> "type" : "attachment", >> "path" : "full", >> "fields" : { >> "attacheddocument" : { >> "type" : "string" >> }, >> "author" : { >> "type" : "string" >> }, >> "title" : { >> "type" : "string" >> }, >> "name" : { >> "type" : "string" >> }, >> "date" : { >> "type" : "date", >> "format" : "dateOptionalTime" >> }, >> "keywords" : { >> "type" : "string" >> }, >> "content_type" : { >> "type" : "string" >> } >> } >> }, >> "cgateId" : { >> "type" : "string" >> }, >> "contents" : { >> "type" : "string" >> }, >> "date" : { >> "type" : "date", >> "format" : "dateOptionalTime", >> "include_in_all" : true >> }, >> "filePath" : { >> "type" : "string" >> }, >> "from" : { >> "properties" : { >> "address" : { >> "type" : "string" >> }, >> "encodedPersonal" : { >> "type" : "string", >> "include_in_all" : true >> } >> } >> }, >> "hasattachments" : { >> "type" : "boolean" >> }, >> "numlines" : { >> "type" : "long" >> }, >> "recipient" : { >> "properties" : { >> "address" : { >> "type" : "string" >> }, >> "encodedPersonal" : { >> "type" : "string", >> "include_in_all" : true >> } >> } >> }, >> "seen" : { >> "type" : "boolean" >> }, >> "subject" : { >> "type" : "string" >> } >> } >> } >> } >> >> Here's the output of the indexing command attempt >> [maxper@max ~]$ curl -XPOST 'http://localhost:9200/anagrafiche/email/' >> -d @testindex.json >> {"error":"MapperParsingException[failed to parse]; nested: >> JsonParseException[Failed to decode VALUE_STRING as base64 >> (MIME-NO-LINEFEEDS): Unexpected padding character ('=') as character #3 of >> 4-char base64 unit: padding only legal as 3rd or 4th character\n at >> [Source: [B@45387c9d; line: 1, column: 32804]]; ","status":400} >> [maxper@max ~]$ >> >> Just to be clear, I really can index some documents, so the mapping >> should be correct. >> >> I hope someone may help me :) >> Thanks, Massimiliano >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/428f1d4b-8a66-4a75-a120-b261894b7b89%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
