Thank you for the reply. I do need to do work on the data before importing such 
as language detection and geocoding using third party libraries and I feel like 
log stash may be great for getting my some of the way it won't be able to get 
me all the way.

A custom plugin may be my only option in that regard but is it really going to 
provide me any benefits over something like scrapy? Any feedback would be 
appreciated

Sent from my iPhone

> On 3 Apr 2015, at 00:44, Mark Walkom <markwal...@gmail.com> wrote:
> 
> You can do data transformation on the fly, yes.
> 
> Language detection can't be done in LS that I know of, but you can definitely 
> trim things.
> 
>> On 3 April 2015 at 13:16, Employ <m...@employ.com> wrote:
>> Thank you for the reply. I've seen that mentioned but does it have the 
>> capability to modify the XML content before it is imported? For example, 
>> adding the ability to do language detection and trimming via custom scripts?
>> 
>>> On 2 Apr 2015, at 19:44, Mark Walkom <markwal...@gmail.com> wrote:
>>> 
>>> Logstash can handle XML, it has a filter specifically for it - 
>>> http://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html
>>> 
>>>> On 3 April 2015 at 09:33, James <m...@employ.com> wrote:
>>>> Hi,
>>>> 
>>>> Currently I am using scrapy to parse an XML file from an ftp server into 
>>>> elasticsearch. It works but seems quite a heavy weight solution and it 
>>>> uses a lot of memory too.
>>>> 
>>>> I am wondering if I am better off writing a plugin for ES instead.
>>>> 
>>>> I have some questions:
>>>> 
>>>> A) It seems writing it in Python (since I'm a python guy) as a push plugin 
>>>> rather than a pull river makes sense, unless anyone has a reason why pull 
>>>> is better?
>>>> 
>>>> B) For simple importing (and slight modification such as trimming, 
>>>> language check etc) is it likely that an ES plugin is likely going to be a 
>>>> better solution to importing fairly large XML files or should I just leave 
>>>> scrapy to do it as it is doing at the moment?
>>>> 
>>>> Any help and advice would be appreciated as I start on this journey.
>>>> 
>>>> James
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to elasticsearch+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/610e7f9b-3d23-44a9-b8f3-07deb262dd54%40googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8TLso3YjNLpqHoR5r87nr6Li2Ng53AjHwwNzE1j9FJeA%40mail.gmail.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/AE7A0FB1-0DE9-4BBF-BEE1-7A29964204E5%40employ.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/L9uzIGfT7Gs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X92a7G536TNgArbzdvC9P%2B0gKMS_5jMBxT9ZBVDJ9PMMg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5AAD2EB6-D5A9-46DD-8A8A-E3FBC4154929%40employ.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to