Just adding to what Wolfgang said, the BlobDeserializer and BlobHandler are going be moved to a flume-core soon (https://issues.apache.org/jira/browse/FLUME-2226 and https://issues.apache.org/jira/browse/FLUME-2227).
Thanks, Hari On Monday, November 11, 2013 at 11:54 AM, Wolfgang Hoschek wrote: > Hi Otis, > > You bring up a lot of very good points here, indeed. I'll try to answer as > best as I can... > > In the early days this Flume Sink started out as being very Solr specific. > Over time I have made it more generic and reduced the dependency on Solr more > and more, and at this point, there is in fact no dependency on Solr in the > code left anymore (except in some tests that straddle the boundary between > unit tests and integration tests). So in effect it wouldn't be technically > wrong to refer to this as a Morphline Sink. The name is just a reflection of > an evolutionary journey through history, and for retaining backwards compat. > > You could easily use this sink to extract, transform and load data into ES > (or any other app or database or storage system) without pulling in any Solr > related jar. To do so you'd write a loadElasticSearch morphline command in a > separate morphline maven module, and use that command instead of the loadSolr > command in your morphline config files. The new loadElasticSearch command > would convert a morphline record to a data structure appropriate for ES, e.g. > ES JSON/Smile, and send that to ES. That's all there is to it, really. > > A morphline record is essentially a hash table where the keys are strings and > the values are a list of arbitrary Java objects. Those Java objects are > typically Strings and Integers, but they can also be InputStreams or byte[] > BLOBs, Avro objects, etc. This data model corresponds exactly to the features > of the Lucene data model. It can also be seen as a superset of the Flume > event data model - the Flume body is a byte[] value in the morphline > _attachment_body field. The data model also maps well to the relational > model. It also can be used for hierarchical data considering that the values > in a morphline record field can be Avro, JSON, XML, protobufs, or any other > custom complex data structure. > > Wolfgang. > > On Nov 10, 2013, at 4:42 PM, Otis Gospodnetic wrote: > > > Hello, > > > > One more "proactive" question. > > > > Isn't all code under the .... solr/morphline package not really about > > Morphline *Solr* Sink, but really more about *Morphline* Sink? > > In other words, if where Morphline actually outputs is dictated by the > > Morphline command in Morphline config (e.g. loadSolr()), then as far > > as Flume is concerned, isn't that really just *Morphline* Sink? > > > > For example, if I wanted to get Flume to pass events through Morphline > > and have Morphline output to Elasticsearch, I wouldn't really want to > > add a while new Elasticsearch Morphline Sink. I should really just be > > able to use the existing (misnamed?) Morphline Solr Sink and just > > point it to a Morphline config that has laodElasticsearch() instead of > > loadSolr(). > > > > (please ignore the fact Morphline doesn't actually have > > loadElasticsearch() yet - I think this is a Morphline issue, not a > > Flume issue) > > > > Is the above correct? > > > > Thanks, > > Otis > > -- > > Performance Monitoring * Log Analytics * Search Analytics > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic > > <[email protected] (mailto:[email protected])> wrote: > > > Hello, > > > > > > Warning: I've got a Flume NG and Morphlines newbie status > > > > > > I was looking at Morphline Solr Sink to see how one could write an > > > equivalent Morphline Elasticsearch Sink, but after looking at the > > > code, I'm a bit confused. Here are my Qs: > > > > > > 1) interface MorphlineHandler mentions Solr in N places, but it > > > doesn't seem to be Solr-specific. Couldn't one reuse this interface > > > for a Morphline ES Sink? > > > > > > 2) In general, couldn't/shouldn't a few classes from > > > org.apache.flume.sink.solr.morphline package really not outside > > > anything solr-specific? e.g. org.apache.flume.sink.morphline for > > > those that are Morphline-specific? > > > > > > 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even > > > Morphline-specific. Shouldn't they be elsewhere? > > > > > > 4) I was expecting to see SolrJ (Solr Java client library) being used > > > in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr, > > > but there is no trace of SolrJ there. How exactly does this load > > > Flume events into Solr then? > > > Ooooh, is that because when using this sink one is supposed to provide > > > a Morphline config and this config has a hard-coded loadSolr() > > > command? > > > > > > 5) Would it make sense to refactor any of the current Morphline Solr > > > Sink code to make it easier to add things Morphline Elasticsearch > > > Sink? If so, any guidance you could provide would be very helpful. > > > > > > Thanks, > > > Otis > > > -- > > > Performance Monitoring * Log Analytics * Search Analytics > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > >
