As I told you on SOF, you need to Base64 decode your content.

For example what you sent is well decoded as a PDF…
(tested with http://www.motobit.com/util/base64-decoder-encoder.asp)

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-US) >>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 7 0 
R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] 
/Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S>>
endobj
4 0 obj

…



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 19 mars 2014 à 17:21:54, sAs59 ([email protected]) a écrit:

Yes, I have also posted this question on stackoverflow.  
I tried to decode my content with base64 decoder, but resulting I didn't get  
my actual content  
The content of sample.pdf:  

The world is changing, and changing priorities for the development of  
society. Use  
of information technology accelerates this process. Information  
treated as a  
commodity, and its role as a commodity increases, while the value of  
information  
depends on the timing and the cost of its treatment. The growth of  
complexity and  
amount of information makes the question of finding new approaches to access  
it  
as the use of traditional technology results in longer and the  
cost of developing  
software tools to access information. Existing systems provide search  
information  
sources of the same type, while ignoring others. For example,  
systems such as  
Yandex, Rambler, Google, Yahoo provide information search in the databases  
of  
keywords corresponding to certain HTML - pages, the rest of the same  
information  
(audio, video and other data other than HTML - pages) located on servers  
in the  
Internet remains unaddressed. This determines the relevance of the  
work to  
develop more efficient methods for constructing systems of access to  
distributed  
heterogeneous information.  

It's the only file in my files collection and I used the following query  
http://localhost:9200/mongoindex/_search?pretty=true  

In resulting, got the content as follows  

JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0KPDwvVHlwZS9QYWdlcy9Db3VudCAxL0tpZHNbIDMgMCBSXSA+Pg0KZW5kb2JqDQozIDAgb2JqDQo8PC9UeXBlL1BhZ2UvUGFyZW50IDIgMCBSL1Jlc291cmNlczw8L0ZvbnQ8PC9GMSA1IDAgUi9GMiA3IDAgUj4+L1Byb2NTZXRbL1BERi9UZXh0L0ltYWdlQi9JbWFnZUMvSW1hZ2VJXSA+Pi9NZWRpYUJveFsgMCAwIDU5NS4zMiA4NDEuOTJdIC9Db250ZW50cyA0IDAgUi9Hcm91cDw8L1R5cGUvR3JvdXAvUy9UcmFuc3BhcmVuY3kvQ1MvRGV2aWNlUkdCPj4vVGFicy9TPj4NCmVuZG9iag0KNCAwIG9iag0KPDwvRmlsdGVyL0ZsYXRlRGVjb2RlL0xlbmd0aCAxNTQ5Pj4NCnN0cmVhbQ0KeJytWEtv20YQvhvwf9CRBOKNllwuyWuAOE3RXgr3ECQ90BIlC5Ytl5Ts9N93ZnZnd5aRIhUoAjjc9zevb2b04e766v2tnmmj5mZ2t7q+0rM5/NOzplIapuq6UJWd3T1dX81na/zz6frqa3aXm+whvymzfpbfaJO9wXiH4yFvsi1+LPEPLFbZBr/GMFzAOZN1sO8ZDq1xfgMfYfAObyyyDif4EnhhAQfoRXcSPtawwZ2kAd//QjA2EhAh2NN2hAwzY7x4tUM8Awws7ZFiLWHcv+J6T3L5O1/g4ylvs/4Zl/bxLlpd0VUjb14glJ42/ZPrMlNOvj9hdmQ4s/yv2d2v11cfwRwf7n40QlWq1pARvsY3SltF+Vdwi5P3Kdd11u2dXuHPjrfQEUMy9mCE0s3ichBsHVD6zd1i0W9Rq4hy6OjsiELjeiUUtmFx/EFU0UCaRcP1Iy8rWi+yz2her/oEsTkC2Juxd+8v+X3r3Eg+212gzLJSxnplLggB2pJA7ILnepdhdbw7caeuClWmV556XldW2SLd25EDhVCpo6eOcc6pcctuCTNSahy6/y6VRbxEal4MfZfX0RlHjsKqorgO9pUQkkiB8SuMO9pwQBu5yeCq9N6FnmrOGlBb1TZeg0tGTWHfO9IQrqjdW89xnCCnMb9PeLxeTGAUM4+UsyR0OJXcAuMFSeUf3k9eX7lvVMFekqFHM6Bjk+ngfcEpFCu14FpybnxtzSFB17/lxuNhdOHVgIzFIzU5Q373/Jl4BkrrnfKsIWzbqJIN0eU3jXC8A6uRdQFkFVDV5r8SV+210z0SGTMF1SY1BIz/hvEhceW9l3PHkNxGiWbFGWj5QzpyGwhH/xYOd+xxxHMsQheSlMwxHuQuDpFVKeKM3LTxZjyn9bpVLSudFaFNMUlepVfDKGeCzDhwvIqSBKknHHwcR6FL1bQpji3nQL63d6ow55IMAh+cPQ9+k6Q/XN4IP8DxVtqSLusH//SkaDiiE0eSo3DMVC9wJlDKK38kiV8QxHlbWa1M4ZU0Ok14n99z5HYDYrPWO4l/aSuUAKEjHQiG4EDgP05tIhFETr8gqpBdbJl9hPF3NvsofSCtq3yCILtFkuvpegHUVV+k5FdW1tJnBGu9N3YDwn+Ip/6/5GBNEUulkQ9SJAw+6LzZbSPMXk8yGk7Qzs6RctjDKnDphmrVOkmTgV79GxunRQfeiTblF11bQkm3Y8z0A1mWKkUghltI6e6kh0HU3U35nMGctxM+OAaWDqTlV7rxAjUXRpWs5i+5kZUMRgx5FAJq6+wPh/Q+Ih1opck+IcgdG2kdSgy/TPc+yKjA+ZZry8TDjqMti0LpKkVLhpnPhc/hVUf8LWYLeDNUR6nvtloEnRsnfgRj0glVrt09dz5preX2BWY4p/p5pZrSC/NIF4GxC9GCLRNaiAXAMHDWeZERtUxJjY9JyoFL+iH3/t9xR/UcVn+BvVim/P7bT1ynLKwydYr/5tTeUpPh5N6T95ZG1ZO9JGC3Du0KelSZmsZC3oklAkqyQypoQinV+z4uYQErSfZI+3I5U1VNrcrK4/2WdYe8jIl4J+In5dGdz/jgk9xDc8JrW08jLOXgXR0POoCd8LV0X1tE/nHB7LZeZtqqRd0nIp00rTWq0unek/faVpnJvWxaVwea7FvupQxZetFxfg29Iskcy4gQ0ENI84OMxjYpOmAYdXU+QCvboD86tJ99OY+k/gwW7kNvUM7BAZ3vhOgbxeLBu1G3ZBMPsY3uuWqjHqFofJNg0h9bgP4iJTsMssfBhdDMl5OeBiLQBYij5BMiG90qPRH5lUkC0S/4rtgIFcHTxFNvIRWgIR6FHgQP/Vzt1VwV9bQtdMbd+iTywmmuqESvQgWYLmzWr6ARoF9s/OnnSA7iUB8CZcdhOYo9sYbBCU2NuYnmpY7vwL+MiGpLFATFRVm8EC0VvoM6h85izKMH4PSeScmfWUY3ET+u0Mx97srwffSx8zwGVF1zI+Bs6lxtl/vyMWiy901ybExmudYnytbm2E9CaoLl/W0x03r622VRtmrepsikFP8C8sKIqQ0KZW5kc3RyZWFtDQplbmRvYmoNCjUgMCBvYmoNCjw8L1R5cGUvRm9udC9TdWJ0eXBlL1RydWVUeXBlL05hbWUvRjEvQmFzZUZvbnQvVGltZXMjMjBOZXcjMjBSb21hbi9FbmNvZGluZy9XaW5BbnNpRW5jb2RpbmcvRm9udERlc2NyaXB0b3IgNiAwIFIvRmlyc3RDaGFyIDMyL0xhc3RDaGFyIDEyMS9XaWR0aHMgMTAgMCBSPj4NCmVuZG9iag0KNiAwIG9iag0KPDwvVHlwZS9Gb250RGVzY3JpcHRvci9Gb250TmFtZS9UaW1lcyMyME5ldyMyMFJvbWFuL0ZsYWdzIDMyL0l0YWxpY0FuZ2xlIDAvQXNjZW50IDg5MS9EZXNjZW50IC0yMTYvQ2FwSGVpZ2h0IDY5My9BdmdXaWR0aCA0MDEvTWF4V2lkdGggMjYxNC9Gb250V2VpZ2h0IDQwMC9YSGVpZ2h0IDI1MC9MZWFkaW5nIDQyL1N0ZW1WIDQwL0ZvbnRCQm94WyAtNTY4IC0yMTYgMjA0NiA2OTNdID4+DQplbmRvYmoNCjcgMCBvYmoNCjw8L1R5cGUvRm9udC9TdWJ0eXBlL1RydWVUeXBlL05hbWUvRjIvQmFzZUZvbnQvQUJDREVFK0NhbGlicmkvRW5jb2RpbmcvV2luQW5zaUVuY29kaW5nL0ZvbnREZXNjcmlwdG9yIDggMCBSL0ZpcnN0Q2hhciAzMi9MYXN0Q2hhciAzMi9XaWR0aHMgMTEgMCBSPj4NCmVuZG9iag0KOCAwIG9iag0KPDwvVHlwZS9Gb250RGVzY3JpcHRvci9Gb250TmFtZS9BQkNERUUrQ2FsaWJyaS9GbGFncyAzMi9JdGFsaWNBbmdsZSAwL0FzY2VudCA3NTAvRGVzY2VudCAtMjUwL0NhcEhlaWdodCA3NTAvQXZnV2lkdGggNTIxL01heFdpZHRoIDE3NDMvRm9udFdlaWdodCA0MDAvWEhlaWdodCAyNTAvU3RlbVYgNTIvRm9udEJCb3hbIC01MDMgLTI1MCAxMjQwIDc1MF0gL0ZvbnRGaWxlMiAxMiAwIFI+Pg0KZW5kb2JqDQo5IDAgb2JqDQo8PC9Qcm9kdWNlcihjb252ZXJ0b25saW5lZnJlZS5jb20pL0NyZWF0b3IoY29udmVydG9ubGluZWZyZWUuY29tKS9DcmVhdGlvbkRhdGUoRDoyMDE0MDMxNjEyMTQxMykgL01vZERhdGUoRDoyMDE0MDMxNjEyMTQxMykgPj4NCmVuZG9iag0KMTAgMCBvYmoNClsgMjUwIDAgMCAwIDAgMCAwIDAgMzMzIDMzMyAwIDAgMjUwIDMzMyAyNTAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCA2MTEgNTU2IDcyMiA3MjIgMzMzIDAgMCA2MTEgODg5IDAgMCAwIDAgNjY3IDAgNjExIDcyMiAwIDAgMCA3MjIgMCAwIDAgMCAwIDAgMCA0NDQgNTAwIDQ0NCA1MDAgNDQ0IDMzMyA1MDAgNTAwIDI3OCAwIDUwMCAyNzggNzc4IDUwMCA1MDAgNTAwIDUwMCAzMzMgMzg5IDI3OCA1MDAgNTAwIDcyMiA1MDAgNTAwXSANCmVuZG9iag0KMTEgMCBvYmoNClsgMjI2XSANCmVuZG9
  

This is only a small part I've copied  
Maybe the problem is in mapping?  

p.s. Sorry for my bad english)  



--  
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/searching-pdf-files-by-content-with-Mongodb-river-tp4051989p4052267.html
  
Sent from the ElasticSearch Users mailing list archive at Nabble.com.  

--  
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.  
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].  
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1395246111167-4052267.post%40n3.nabble.com.
  
For more options, visit https://groups.google.com/d/optout.  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.5329c602.7f01579b.97ca%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Reply via email to