Bump on this. I've tried using the built in Java Base64 encoding: Base64.getMimeEncoder().encode(Files.readAllBytes(file.toPath()));
And using jackson's ObjectMapper function as follows String base64 = mapper.writeValueAsString(Files.readAllBytes(file.toPath ())); base64 = base64.substring(1, base64.length() - 1); Can anyone help out out and point me where I'm going wrong? On Sunday, May 12, 2013 3:59:54 PM UTC-4, Massimiliano Perantoni wrote: > > Hi, > I installed elasticsearch flawlessly and started developing a mail > indexing solution. > Dealing with the main setup everything went flawlessly, I even installed > the plugin for tika document text extraction. > After that I wrote some simple beans to write in the system some emails > after parsing using java mail. > When it comes to index attachments (docs, pdfs, docx, open documents, etc > etc), several mails got indexed correctly, some others no. > I had some problems in putting direct base64 encoded documents from the > email, even because when it comes to encoding, I preferred to decode the > contents and reencode it, just to be sure I wrote everything correctly. > When I create the json file (attached to the email), I succeed even in > creating the decoded document whici is readable and the payload I pass to > elasticsearch is working. > Here are the versions: > > elasticsearch versione 0.90.0 > elasticsearch-mapper-attachments 1.7.0 > > See attached json as test document > Here's the mapping I used > curl -XGET 'http://localhost:9200/anagrafiche/email/_mapping?pretty=true' > { > "email" : { > "properties" : { > "addTimestamp" : { > "type" : "string" > }, > "answered" : { > "type" : "boolean" > }, > "attacheddocument" : { > "type" : "attachment", > "path" : "full", > "fields" : { > "attacheddocument" : { > "type" : "string" > }, > "author" : { > "type" : "string" > }, > "title" : { > "type" : "string" > }, > "name" : { > "type" : "string" > }, > "date" : { > "type" : "date", > "format" : "dateOptionalTime" > }, > "keywords" : { > "type" : "string" > }, > "content_type" : { > "type" : "string" > } > } > }, > "cgateId" : { > "type" : "string" > }, > "contents" : { > "type" : "string" > }, > "date" : { > "type" : "date", > "format" : "dateOptionalTime", > "include_in_all" : true > }, > "filePath" : { > "type" : "string" > }, > "from" : { > "properties" : { > "address" : { > "type" : "string" > }, > "encodedPersonal" : { > "type" : "string", > "include_in_all" : true > } > } > }, > "hasattachments" : { > "type" : "boolean" > }, > "numlines" : { > "type" : "long" > }, > "recipient" : { > "properties" : { > "address" : { > "type" : "string" > }, > "encodedPersonal" : { > "type" : "string", > "include_in_all" : true > } > } > }, > "seen" : { > "type" : "boolean" > }, > "subject" : { > "type" : "string" > } > } > } > } > > Here's the output of the indexing command attempt > [maxper@max ~]$ curl -XPOST 'http://localhost:9200/anagrafiche/email/' > -d @testindex.json > {"error":"MapperParsingException[failed to parse]; nested: > JsonParseException[Failed to decode VALUE_STRING as base64 > (MIME-NO-LINEFEEDS): Unexpected padding character ('=') as character #3 of > 4-char base64 unit: padding only legal as 3rd or 4th character\n at > [Source: [B@45387c9d; line: 1, column: 32804]]; ","status":400} > [maxper@max ~]$ > > Just to be clear, I really can index some documents, so the mapping should > be correct. > > I hope someone may help me :) > Thanks, Massimiliano > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7dd13148-7aca-4930-90f9-00d2d747a3cf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
