Re: Problems indexing attachments using attachment mapping

raymond . giorgi Wed, 19 Nov 2014 11:23:22 -0800

Bump on this. I've tried using the built in Java Base64 encoding:
Base64.getMimeEncoder().encode(Files.readAllBytes(file.toPath()));



And using jackson's ObjectMapper function as follows
String base64 = mapper.writeValueAsString(Files.readAllBytes(file.toPath
()));
base64 = base64.substring(1, base64.length() - 1);


Can anyone help out out and point me where I'm going wrong?

On Sunday, May 12, 2013 3:59:54 PM UTC-4, Massimiliano Perantoni wrote:
>
> Hi,
> I installed elasticsearch flawlessly and started developing a mail 
> indexing solution.
> Dealing with the main setup everything went flawlessly, I even installed 
> the plugin for tika document text extraction.
> After that I wrote some simple beans to write in the system some emails 
> after parsing using java mail.
> When it comes to index attachments (docs, pdfs, docx, open documents, etc 
> etc), several mails got indexed correctly, some others no.
> I had some problems in putting direct base64 encoded documents from the 
> email, even because when it comes to encoding, I preferred to decode the 
> contents and reencode it, just to be sure I wrote everything correctly.
> When I create the json file (attached to the email), I succeed even in 
> creating the decoded document whici is readable and the payload I pass to 
> elasticsearch is working.
> Here are the versions:
>
> elasticsearch versione 0.90.0
> elasticsearch-mapper-attachments 1.7.0
>
> See attached json as test document
> Here's the mapping I used
> curl -XGET 'http://localhost:9200/anagrafiche/email/_mapping?pretty=true'
> {
>   "email" : {
>     "properties" : {
>       "addTimestamp" : {
>         "type" : "string"
>       },
>       "answered" : {
>         "type" : "boolean"
>       },
>       "attacheddocument" : {
>         "type" : "attachment",
>         "path" : "full",
>         "fields" : {
>           "attacheddocument" : {
>             "type" : "string"
>           },
>           "author" : {
>             "type" : "string"
>           },
>           "title" : {
>             "type" : "string"
>           },
>           "name" : {
>             "type" : "string"
>           },
>           "date" : {
>             "type" : "date",
>             "format" : "dateOptionalTime"
>           },
>           "keywords" : {
>             "type" : "string"
>           },
>           "content_type" : {
>             "type" : "string"
>           }
>         }
>       },
>       "cgateId" : {
>         "type" : "string"
>       },
>       "contents" : {
>         "type" : "string"
>       },
>       "date" : {
>         "type" : "date",
>         "format" : "dateOptionalTime",
>         "include_in_all" : true
>       },
>       "filePath" : {
>         "type" : "string"
>       },
>       "from" : {
>         "properties" : {
>           "address" : {
>             "type" : "string"
>           },
>           "encodedPersonal" : {
>             "type" : "string",
>             "include_in_all" : true
>           }
>         }
>       },
>       "hasattachments" : {
>         "type" : "boolean"
>       },
>       "numlines" : {
>         "type" : "long"
>       },
>       "recipient" : {
>         "properties" : {
>           "address" : {
>             "type" : "string"
>           },
>           "encodedPersonal" : {
>             "type" : "string",
>             "include_in_all" : true
>           }
>         }
>       },
>       "seen" : {
>         "type" : "boolean"
>       },
>       "subject" : {
>         "type" : "string"
>       }
>     }
>   }
> }
>
> Here's the output of the indexing command attempt
> [maxper@max ~]$ curl -XPOST 'http://localhost:9200/anagrafiche/email/' 
> -d  @testindex.json 
> {"error":"MapperParsingException[failed to parse]; nested: 
> JsonParseException[Failed to decode VALUE_STRING as base64 
> (MIME-NO-LINEFEEDS): Unexpected padding character ('=') as character #3 of 
> 4-char base64 unit: padding only legal as 3rd or 4th character\n at 
> [Source: [B@45387c9d; line: 1, column: 32804]]; ","status":400}
> [maxper@max ~]$ 
>
> Just to be clear, I really can index some documents, so the mapping should 
> be correct.
>
> I hope someone may help me :)
> Thanks, Massimiliano
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dd13148-7aca-4930-90f9-00d2d747a3cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Problems indexing attachments using attachment mapping

Reply via email to