Ok. I think that's a fair comment and is likely the root I will take. Thank you both for your help.
> On 4 Apr 2015, at 20:33, Norberto Meijome <num...@gmail.com> wrote: > > Hi, > My gut feel is don't add this to the ES setup itself. Horses for courses - > have your script (Python +1) running somewhere taking care of the processing, > dealing with issues on the ftp side , etc. Let ES do its thing...specially if > the XML parsing will take so much memory and you need external services. > > The script can run/be managed/ designed in many ways , from a simple cronjob > to a celery task or a service under chronos/mesos , or a consumer getting > messages and publishing to ES ( though if you have 1 large XML to process > once a day I wouldn't go with a consumer ...) > > Good luck, > B > >> On 03/04/2015 3:47 pm, "Employ" <m...@employ.com> wrote: >> Thank you for the reply. I do need to do work on the data before importing >> such as language detection and geocoding using third party libraries and I >> feel like log stash may be great for getting my some of the way it won't be >> able to get me all the way. >> >> A custom plugin may be my only option in that regard but is it really going >> to provide me any benefits over something like scrapy? Any feedback would be >> appreciated >> >> Sent from my iPhone >> >>> On 3 Apr 2015, at 00:44, Mark Walkom <markwal...@gmail.com> wrote: >>> >>> You can do data transformation on the fly, yes. >>> >>> Language detection can't be done in LS that I know of, but you can >>> definitely trim things. >>> >>>> On 3 April 2015 at 13:16, Employ <m...@employ.com> wrote: >>>> Thank you for the reply. I've seen that mentioned but does it have the >>>> capability to modify the XML content before it is imported? For example, >>>> adding the ability to do language detection and trimming via custom >>>> scripts? >>>> >>>>> On 2 Apr 2015, at 19:44, Mark Walkom <markwal...@gmail.com> wrote: >>>>> >>>>> Logstash can handle XML, it has a filter specifically for it - >>>>> http://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html >>>>> >>>>>> On 3 April 2015 at 09:33, James <m...@employ.com> wrote: >>>>>> Hi, >>>>>> >>>>>> Currently I am using scrapy to parse an XML file from an ftp server into >>>>>> elasticsearch. It works but seems quite a heavy weight solution and it >>>>>> uses a lot of memory too. >>>>>> >>>>>> I am wondering if I am better off writing a plugin for ES instead. >>>>>> >>>>>> I have some questions: >>>>>> >>>>>> A) It seems writing it in Python (since I'm a python guy) as a push >>>>>> plugin rather than a pull river makes sense, unless anyone has a reason >>>>>> why pull is better? >>>>>> >>>>>> B) For simple importing (and slight modification such as trimming, >>>>>> language check etc) is it likely that an ES plugin is likely going to be >>>>>> a better solution to importing fairly large XML files or should I just >>>>>> leave scrapy to do it as it is doing at the moment? >>>>>> >>>>>> Any help and advice would be appreciated as I start on this journey. >>>>>> >>>>>> James >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to elasticsearch+unsubscr...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elasticsearch/610e7f9b-3d23-44a9-b8f3-07deb262dd54%40googlegroups.com. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "elasticsearch" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> elasticsearch+unsubscr...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8TLso3YjNLpqHoR5r87nr6Li2Ng53AjHwwNzE1j9FJeA%40mail.gmail.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to elasticsearch+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/AE7A0FB1-0DE9-4BBF-BEE1-7A29964204E5%40employ.com. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "elasticsearch" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> elasticsearch+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X92a7G536TNgArbzdvC9P%2B0gKMS_5jMBxT9ZBVDJ9PMMg%40mail.gmail.com. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/5AAD2EB6-D5A9-46DD-8A8A-E3FBC4154929%40employ.com. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CACj2-4JZ3gSngza0LmJBkPmR5zHWjwNm1tZ%3DKY3x0MNvcm70Rg%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55176C11-6381-41B1-A224-57067C6F3EC9%40employ.com. For more options, visit https://groups.google.com/d/optout.