Just adding to what Wolfgang said, the BlobDeserializer and BlobHandler are 
going be moved to a flume-core soon 
(https://issues.apache.org/jira/browse/FLUME-2226 and 
https://issues.apache.org/jira/browse/FLUME-2227).


Thanks,
Hari


On Monday, November 11, 2013 at 11:54 AM, Wolfgang Hoschek wrote:

> Hi Otis,
> 
> You bring up a lot of very good points here, indeed. I'll try to answer as 
> best as I can...
> 
> In the early days this Flume Sink started out as being very Solr specific. 
> Over time I have made it more generic and reduced the dependency on Solr more 
> and more, and at this point, there is in fact no dependency on Solr in the 
> code left anymore (except in some tests that straddle the boundary between 
> unit tests and integration tests). So in effect it wouldn't be technically 
> wrong to refer to this as a Morphline Sink. The name is just a reflection of 
> an evolutionary journey through history, and for retaining backwards compat.
> 
> You could easily use this sink to extract, transform and load data into ES 
> (or any other app or database or storage system) without pulling in any Solr 
> related jar. To do so you'd write a loadElasticSearch morphline command in a 
> separate morphline maven module, and use that command instead of the loadSolr 
> command in your morphline config files. The new loadElasticSearch command 
> would convert a morphline record to a data structure appropriate for ES, e.g. 
> ES JSON/Smile, and send that to ES. That's all there is to it, really.
> 
> A morphline record is essentially a hash table where the keys are strings and 
> the values are a list of arbitrary Java objects. Those Java objects are 
> typically Strings and Integers, but they can also be InputStreams or byte[] 
> BLOBs, Avro objects, etc. This data model corresponds exactly to the features 
> of the Lucene data model. It can also be seen as a superset of the Flume 
> event data model - the Flume body is a byte[] value in the morphline 
> _attachment_body field. The data model also maps well to the relational 
> model. It also can be used for hierarchical data considering that the values 
> in a morphline record field can be Avro, JSON, XML, protobufs, or any other 
> custom complex data structure.
> 
> Wolfgang.
> 
> On Nov 10, 2013, at 4:42 PM, Otis Gospodnetic wrote:
> 
> > Hello,
> > 
> > One more "proactive" question.
> > 
> > Isn't all code under the .... solr/morphline package not really about
> > Morphline *Solr* Sink, but really more about *Morphline* Sink?
> > In other words, if where Morphline actually outputs is dictated by the
> > Morphline command in Morphline config (e.g. loadSolr()), then as far
> > as Flume is concerned, isn't that really just *Morphline* Sink?
> > 
> > For example, if I wanted to get Flume to pass events through Morphline
> > and have Morphline output to Elasticsearch, I wouldn't really want to
> > add a while new Elasticsearch Morphline Sink. I should really just be
> > able to use the existing (misnamed?) Morphline Solr Sink and just
> > point it to a Morphline config that has laodElasticsearch() instead of
> > loadSolr().
> > 
> > (please ignore the fact Morphline doesn't actually have
> > loadElasticsearch() yet - I think this is a Morphline issue, not a
> > Flume issue)
> > 
> > Is the above correct?
> > 
> > Thanks,
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> > 
> > 
> > On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic
> > <[email protected] (mailto:[email protected])> wrote:
> > > Hello,
> > > 
> > > Warning: I've got a Flume NG and Morphlines newbie status
> > > 
> > > I was looking at Morphline Solr Sink to see how one could write an
> > > equivalent Morphline Elasticsearch Sink, but after looking at the
> > > code, I'm a bit confused. Here are my Qs:
> > > 
> > > 1) interface MorphlineHandler mentions Solr in N places, but it
> > > doesn't seem to be Solr-specific. Couldn't one reuse this interface
> > > for a Morphline ES Sink?
> > > 
> > > 2) In general, couldn't/shouldn't a few classes from
> > > org.apache.flume.sink.solr.morphline package really not outside
> > > anything solr-specific? e.g. org.apache.flume.sink.morphline for
> > > those that are Morphline-specific?
> > > 
> > > 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
> > > Morphline-specific. Shouldn't they be elsewhere?
> > > 
> > > 4) I was expecting to see SolrJ (Solr Java client library) being used
> > > in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
> > > but there is no trace of SolrJ there. How exactly does this load
> > > Flume events into Solr then?
> > > Ooooh, is that because when using this sink one is supposed to provide
> > > a Morphline config and this config has a hard-coded loadSolr()
> > > command?
> > > 
> > > 5) Would it make sense to refactor any of the current Morphline Solr
> > > Sink code to make it easier to add things Morphline Elasticsearch
> > > Sink? If so, any guidance you could provide would be very helpful.
> > > 
> > > Thanks,
> > > Otis
> > > --
> > > Performance Monitoring * Log Analytics * Search Analytics
> > > Solr & Elasticsearch Support * http://sematext.com/
> > > 
> > 
> > 
> 
> 
> 


Reply via email to