Hi Loïc, the gatherer plugin is still very early (pre-alpha) and not ready because I do it in my spare time.
https://github.com/jprante/elasticsearch-gatherer Jörg On Mon, Mar 31, 2014 at 4:09 PM, loïc moriamé <[email protected]> wrote: > I Jörg ! > Did you have any update on your über plugin ? > > I'm really interested, because I want to plug my MsSQL DB with > ElasticSearch. > I can't modify the software, but I want to have a "near real time" > integration between my MsSQL DB and ES. > > So I hope I can use your work. > > Le jeudi 26 décembre 2013 13:37:36 UTC+1, Jörg Prante a écrit : > >> Rivers were once introduced for demo purposes to load quickly some data >> into ES and make showcases from twitter or wikipedia data. >> >> The Elasticsearch team is now in favor of Logstash. >> >> I start this gatherer plugin for my uses cases where I am not able to use >> Logstash. I have very complex streams, e.g. ISO 2709 record formats with >> some hundred custom transformations in the data, that I reduce to primitive >> key/value streams and RDF triples. Also I plan to build RDF feeds for >> semantic web/linked data platforms, where ES is the search engine. >> >> The gatherer "uber" plugin should work like this: >> >> - it can be installed on one or more nodes and provides a common bulk >> indexing framework >> >> - a gatherer plugin registers in the cluster state (on node level) >> >> - there are standard capabilities, but a gatherer plugin capability can >> be extended in a live cluster by submitting code for inputs, codecs, and >> filters, picked up by a custom class loader (for example, JDBC, and a >> driver jar, and tabular key/value output) >> >> - a gatherer plugin is idling, and accepts jobs in form of JSON commands >> (defining the selection of inputs, codecs, and filters), for example, an >> SQL command >> >> - if a gatherer is told to distribute the jobs fairly and is too busy >> (active job queue length), it forwards them to other gatherers (other >> methods are crontab-like scheduling), and the results of the jobs (ok, >> failed, retry) are registered also in the cluster state (maybe an internal >> index is better because there can be tens of thousands such jobs) >> >> - a client can ask for the state of all the gatherers and all the job >> results >> >> - all jobs can be partitioned and processed in parallel for maximum >> throughput >> >> - the gatherer also creates metrics/statistics of the jobs successfully >> done >> >> Another thing I find important is to enable scripting for processing the >> data streams (JSR 223 scripting, especially Groovy, Jython, Jruby, >> Rhino/Nashorn) >> >> Right now there is no repo, I plan to kickstart the repo in early 2014. >> >> Jörg >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/39bf0bff-b57b-4865-8d19-a062d9a85544%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/39bf0bff-b57b-4865-8d19-a062d9a85544%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGVzHp8vCsf%2BY1%2B9fVy%2BatkQ%2ByPejoMDex_CPwB-mwAsA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
