I Jörg ! Did you have any update on your über plugin ? I'm really interested, because I want to plug my MsSQL DB with ElasticSearch. I can't modify the software, but I want to have a "near real time" integration between my MsSQL DB and ES.
So I hope I can use your work. Le jeudi 26 décembre 2013 13:37:36 UTC+1, Jörg Prante a écrit : > > Rivers were once introduced for demo purposes to load quickly some data > into ES and make showcases from twitter or wikipedia data. > > The Elasticsearch team is now in favor of Logstash. > > I start this gatherer plugin for my uses cases where I am not able to use > Logstash. I have very complex streams, e.g. ISO 2709 record formats with > some hundred custom transformations in the data, that I reduce to primitive > key/value streams and RDF triples. Also I plan to build RDF feeds for > semantic web/linked data platforms, where ES is the search engine. > > The gatherer "uber" plugin should work like this: > > - it can be installed on one or more nodes and provides a common bulk > indexing framework > > - a gatherer plugin registers in the cluster state (on node level) > > - there are standard capabilities, but a gatherer plugin capability can be > extended in a live cluster by submitting code for inputs, codecs, and > filters, picked up by a custom class loader (for example, JDBC, and a > driver jar, and tabular key/value output) > > - a gatherer plugin is idling, and accepts jobs in form of JSON commands > (defining the selection of inputs, codecs, and filters), for example, an > SQL command > > - if a gatherer is told to distribute the jobs fairly and is too busy > (active job queue length), it forwards them to other gatherers (other > methods are crontab-like scheduling), and the results of the jobs (ok, > failed, retry) are registered also in the cluster state (maybe an internal > index is better because there can be tens of thousands such jobs) > > - a client can ask for the state of all the gatherers and all the job > results > > - all jobs can be partitioned and processed in parallel for maximum > throughput > > - the gatherer also creates metrics/statistics of the jobs successfully > done > > Another thing I find important is to enable scripting for processing the > data streams (JSR 223 scripting, especially Groovy, Jython, Jruby, > Rhino/Nashorn) > > Right now there is no repo, I plan to kickstart the repo in early 2014. > > Jörg > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39bf0bff-b57b-4865-8d19-a062d9a85544%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
