Ok. I think that's a fair comment and is likely the root I will take. Thank you 
both for your help.

> On 4 Apr 2015, at 20:33, Norberto Meijome <num...@gmail.com> wrote:
> 
> Hi, 
> My gut feel is don't add this to the ES setup itself. Horses for courses - 
> have your script (Python +1) running somewhere taking care of the processing, 
> dealing with issues on the ftp side , etc. Let ES do its thing...specially if 
> the XML parsing will take so much memory and you need external services.
> 
> The script can run/be managed/ designed in many ways , from a simple cronjob 
> to a celery task or a service under chronos/mesos , or a consumer getting 
> messages and publishing to ES ( though if you have 1 large XML to process 
> once a day I wouldn't go with a consumer ...)
> 
> Good luck,
> B
> 
>> On 03/04/2015 3:47 pm, "Employ" <m...@employ.com> wrote:
>> Thank you for the reply. I do need to do work on the data before importing 
>> such as language detection and geocoding using third party libraries and I 
>> feel like log stash may be great for getting my some of the way it won't be 
>> able to get me all the way.
>> 
>> A custom plugin may be my only option in that regard but is it really going 
>> to provide me any benefits over something like scrapy? Any feedback would be 
>> appreciated
>> 
>> Sent from my iPhone
>> 
>>> On 3 Apr 2015, at 00:44, Mark Walkom <markwal...@gmail.com> wrote:
>>> 
>>> You can do data transformation on the fly, yes.
>>> 
>>> Language detection can't be done in LS that I know of, but you can 
>>> definitely trim things.
>>> 
>>>> On 3 April 2015 at 13:16, Employ <m...@employ.com> wrote:
>>>> Thank you for the reply. I've seen that mentioned but does it have the 
>>>> capability to modify the XML content before it is imported? For example, 
>>>> adding the ability to do language detection and trimming via custom 
>>>> scripts?
>>>> 
>>>>> On 2 Apr 2015, at 19:44, Mark Walkom <markwal...@gmail.com> wrote:
>>>>> 
>>>>> Logstash can handle XML, it has a filter specifically for it - 
>>>>> http://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html
>>>>> 
>>>>>> On 3 April 2015 at 09:33, James <m...@employ.com> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Currently I am using scrapy to parse an XML file from an ftp server into 
>>>>>> elasticsearch. It works but seems quite a heavy weight solution and it 
>>>>>> uses a lot of memory too.
>>>>>> 
>>>>>> I am wondering if I am better off writing a plugin for ES instead.
>>>>>> 
>>>>>> I have some questions:
>>>>>> 
>>>>>> A) It seems writing it in Python (since I'm a python guy) as a push 
>>>>>> plugin rather than a pull river makes sense, unless anyone has a reason 
>>>>>> why pull is better?
>>>>>> 
>>>>>> B) For simple importing (and slight modification such as trimming, 
>>>>>> language check etc) is it likely that an ES plugin is likely going to be 
>>>>>> a better solution to importing fairly large XML files or should I just 
>>>>>> leave scrapy to do it as it is doing at the moment?
>>>>>> 
>>>>>> Any help and advice would be appreciated as I start on this journey.
>>>>>> 
>>>>>> James
>>>>>> 
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/elasticsearch/610e7f9b-3d23-44a9-b8f3-07deb262dd54%40googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "elasticsearch" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> elasticsearch+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8TLso3YjNLpqHoR5r87nr6Li2Ng53AjHwwNzE1j9FJeA%40mail.gmail.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to elasticsearch+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/AE7A0FB1-0DE9-4BBF-BEE1-7A29964204E5%40employ.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X92a7G536TNgArbzdvC9P%2B0gKMS_5jMBxT9ZBVDJ9PMMg%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/5AAD2EB6-D5A9-46DD-8A8A-E3FBC4154929%40employ.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CACj2-4JZ3gSngza0LmJBkPmR5zHWjwNm1tZ%3DKY3x0MNvcm70Rg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55176C11-6381-41B1-A224-57067C6F3EC9%40employ.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to