I'm using golang so I can index the attachment in a different go routine. FSRiver looks cool, with it I just need to upload files to /some/path and FSRiver will index them automatically every 15 minutes or so and doing the hard work of separating text from binary using tika, is that correct? I think I may just use that.
On Fri, Feb 21, 2014 at 4:39 PM, David Pilato <[email protected]> wrote: > You can index attachment content in the same document. That's really fine. > I would only recommend to extract document content and metadata in another > process. > Then, in that process, generate the JSon document with all needed fields. > > This is basically what I did in FSRiver. I removed the mapper attachment > as it was not flexible enough for my use case. > Imagine that you send a big PDF file (100Mb) which contains mostly > pictures and 10kb of text. Instead of sending the full 100Mb document > encoded in Base64, you can extract text and only send text over the wire. > (your network bandwidth will say thank you :-) ) > > > https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L687 > > I don't remember which programming language are you using? Can you use > Tika from it? > > > -- > *David Pilato* | *Technical Advocate* | *Elasticsearch.com* > @dadoonet <https://twitter.com/dadoonet> | > @elasticsearchfr<https://twitter.com/elasticsearchfr> > > > Le 21 février 2014 à 16:31:51, Patrick Aljord ([email protected]) a écrit: > > Salut David :) Thanks for the quick reply. > > So, the best way to do this would be to index the attachment in another > document in another process? Or could it be in the same document as an > attachment but in a different process always? Also is there another way to > index files than by mapper attachment? > > On Friday, February 21, 2014 4:12:09 PM UTC+1, David Pilato wrote: >> >> Salut Patrick! :-) >> >> >> You can not update an existing field with new specification for this >> field. >> You need to either add a new field, create a new type (with the new >> mapping) or create a new index. >> >> In addition to this, if you have existing documents, you'll probably >> need to reindex them. >> >> Note: that although mapper attachment is cool to start with >> elasticsearch, I'd prefer to do text extraction in another process than at >> index time. >> >> -- >> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* >> @dadoonet <https://twitter.com/dadoonet> | >> @elasticsearchfr<https://twitter.com/elasticsearchfr> >> >> >> Le 21 février 2014 à 16:09:00, Patrick Aljord ([email protected]) a écrit: >> >> Hey all, >> >> I'm trying to map a field to have type attachment, this works on new >> indices but not on existing ones. >> Is there a way to do this on existing indices? Here is the gist of it: >> >> https://gist.github.com/patcito/281143ee4f440171c875 >> >> Thanks in advance, >> >> Pat >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit https://groups.google.com/d/ >> msgid/elasticsearch/52be5155-1d34-47e4-9484-284c976c49c2% >> 40googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/23429696-37f0-4994-bbe7-daa83efd3de2%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/5wkxrfMECZA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/etPan.5307734c.2eb141f2.5655%40MacBook-Air-de-David.local > . > > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAK9TW4rFBYRrVjLEiO1q4AcR%3D-J4F%2B0Cv7p02pE5-bzXan6Jxg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
