James,
Don't you think that the spawning DIH2.0 as separate war is a priority?


On Wed, Jun 11, 2014 at 6:39 PM, Dyer, James <[email protected]>
wrote:

> Alexandre,
>
> I think that writing a new entity processor for DIH is a much less risky
> thing to commit than, say, SOLR-4799.  Entity Processors work as plug-ins
> and they aren't likely to break anything else.  So a Morphline
> EntityProcessor is much more likely to be evaluated and committed.
>
> But like anything else, you're going to need to explain what the need is
> and what this new e.p. buys the user community.   There needs to be unit
> tests, etc.
>
> Besides this, if you can show how a morphline e.p. can be a step towards
> migrating away from DIH entirely, then that would be a plus.  Perhaps
> create a new solr example along the lines of the dih solr example that
> demonstrates to users this new way forward.  This would go a long way in
> convincing the community we have a viable alternative to dih.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:[email protected]]
> Sent: Tuesday, June 10, 2014 9:55 PM
> To: [email protected]
> Subject: Re: Adding Morphline support to DIH - worth the effort?
>
> Ripples in the pond again. Spreading and dying. Understandable, but
> still somewhat annoying.
>
> So, what would be the minimal viable next step to move this
> conversation forward? Something for 4.11 as opposed to 5.0?
>
> Anyone with commit status has a feeling of what - minimal -
> deliverable they would put their own weight behind?
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jun 9, 2014 at 10:50 AM, [email protected]
> <[email protected]> wrote:
> >> One of the ideas over DIH discussed earlier is making it standalone.
> >
> > Yeah; my beef with the DIH is that it’s tied to Solr.  But I’d rather see
> > something other than the DIH outside Solr; it’s not worthy IMO.  Why have
> > something Solr specific even?  A great pipeline shouldn’t tie itself to
> any
> > end-point.  There are a variety of solutions out there that I tried.
>  There
> > are the big 3 open-source ETLs: Kettle, Clover, Talend) and they aren’t
> > quite ideal in one way or another.  And Spring-Integration.  And some
> > half-baked data pipelines like OpenPipe & Open Pipeline.  I never got
> around
> > to taking a good look at Findwise’s open-sourced Hydra but I learned
> enough
> > to know to my surprise it was configured in code versus a config file
> (like
> > all the others) and that's a big turn-off to me.  Today I read through
> most
> > of the Morphlines docs and a few choice source files and I’m
> > super-impressed.  But as you note it’s missing a lot of other stuff.  I
> > think something great could be built using it as a core piece.
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Sun, Jun 8, 2014 at 5:51 PM, Mikhail Khludnev
> > <[email protected]> wrote:
> >>
> >> Jack,
> >> I found your considerations quite reasonable.
> >> One of the ideas over DIH discussed earlier is making it standalone. So,
> >> if we start from simple Morphline UI, we can do this extraction. Then,
> such
> >> externalized ETL, will work better with Solr Cloud than DIH works now.
> >> Presumably we can reuse DIH Jdbc Datasources as a source for Morphline
> >> records.
> >> Still open questions in this approach are:
> >> - joins/caching - seem possible with Morphlines but still there is no
> such
> >> command
> >> - delta import - scenario we don't need to forget to handle it
> >> - threads (it's completely out Morphline's concerns)
> >> - distributed processing - it would be great if we can partition
> >> datasource eg something what's done by Scoop
> >> ... what else?
> >>
> >>
> >> On Sun, Jun 8, 2014 at 6:54 PM, Jack Krupansky <[email protected]
> >
> >> wrote:
> >>>
> >>> I've avoided DIH like the plague since it really doesn't fit well in
> >>> Solr, so I'm still baffled as to why you think we need to use DIH as
> the
> >>> foundation for a Solr Morphlines project. That shouldn't stop you, but
> >>> what's the big impediment to taking a clean slate approach to
> Morphlines -
> >>> learn what we can from DIH, but do a fresh, clean "Solr 5.0"
> implementation
> >>> that is not burdened from the get-go with all of DIH's baggage?
> >>>
> >>> Configuring DIH is one of its main problems, so blending Morphlines
> >>> config into DIH config would seem to just make Morphlines less
> attractive
> >>> than it actually is when viewed by itself.
> >>>
> >>> You might also consider how ManifoldCF (another Apache project) would
> >>> integrate with DIH and Morphlines as well. I mean, the core use case
> is ETL
> >>> from external data sources. And how all of this relates to Apache
> Flume as
> >>> well.
> >>>
> >>> But back to the original, still unanswered, question: Why use DIH as
> the
> >>> starting point for integrating Morphlines with Solr - unless the goal
> is to
> >>> make Morphlines unpalatable and less approachable than even DIH
> itself?!
> >>>
> >>> Another question: What does Elasticsearch have in this area (besides
> >>> "rivers")? Are they headed in the Morphlines direction as well?
> >>>
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -----Original Message----- From: Alexandre Rafalovitch
> >>> Sent: Sunday, June 8, 2014 10:16 AM
> >>>
> >>> To: [email protected]
> >>> Subject: Re: Adding Morphline support to DIH - worth the effort?
> >>>
> >>> I see DIH as something that offers a quick way to get things done, as
> >>> long as they fit into DIH's couple of basic scenarios. Going even a
> >>> little beyond hits bugs, bad documentation, inconsistencies and lack
> >>> of ongoing support (e.g. SOLR-4383).
> >>>
> >>> So, if it works for you - great. If it does not - too bad, use SolrJ.
> >>> And given what I observe, I believe the next round of improvements
> >>> might be easier to achieve by moving to a different open-source pipe
> >>> project than trying to keep reinventing and bandaging one of our own.
> >>> Go where strongest community is, etc.
> >>>
> >>> Morphline can be seen as a replacement for DIH's EntityProcessors and
> >>> Transformers (Flume adds other bits). The reasons I think it is worth
> >>> looking at are as follows:
> >>> 1) DIH is not really being maintained or further improved. So, the
> >>> list of EP and Transformers is the same and does not account for new
> >>> requests (which we see periodically on the mailing list); even the new
> >>> implementations get stuck in JIRA (see the JIRA in original email)
> >>> 2) It's not terribly well documented either, so people are always
> >>> struggling to understand how the entity is actually generated and what
> >>> happens when things go wrong
> >>> 3) We are already bundling Morphline jars with Solr. But we are NOT
> >>> using them in any way useful to a non-Hadoop Solr user. Which begs the
> >>> question why did we add them (one answer I guess: because we don't
> >>> have module system).
> >>> 4) Morphlines have more primitives than DIH and the available list
> keeps
> >>> growing
> >>> 5) What separate module for Solr? We have no discovery method for
> >>> modules. Writing one for general consumption is like trying to sing in
> >>> vacuum - the problem is a lot bigger that with individual offering.
> >>>
> >>> In terms of implementation, I think it take defining a custom
> >>> MorphlineEntityProcessor which basically plugs into DIH's current
> >>> DataSources. So, one could use for example DIH SqlDataSource to get a
> >>> list of files and then to handoff to Morphline's black box to parse
> >>> those files into records (e.g. Multiline records), augment them, etc.
> >>> Then, at the end, this gets handed back to DIH to finish it up. I
> >>> think this would work even with nested entities and transformers. The
> >>> Admin UI should also work
> >>>
> >>> Eventually, I think we need a harder discussion about DIH, so this
> >>> partial handover could be a way to test the waters.
> >>>
> >>> Does this make more sense?
> >>>
> >>> Regards,
> >>>   Alex.
> >>> Personal website: http://www.outerthoughts.com/
> >>> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >>> proficiency
> >>>
> >>>
> >>> On Sun, Jun 8, 2014 at 8:41 PM, Jack Krupansky <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> It sounds more like an alternative to DIH rather than an incremental
> >>>> add-on
> >>>> to DIH. I mean, isn't Morphline really just "a DIH for Hadoop"?
> >>>>
> >>>> So, back to Shalin's question, which specific (please detail!) use
> cases
> >>>> of
> >>>> DIH are enhanced by Morphline?
> >>>>
> >>>> Maybe it would help if you simply elaborate what benefits would accrue
> >>>> to
> >>>> adding Morphline to DIH - as opposed to creating a separate module for
> >>>> Solr.
> >>>> I suppose it depends on whether you consider DIH a solid foundation
> or a
> >>>> weak link in Solr that desperately needs firming up.
> >>>>
> >>>> -- Jack Krupansky
> >>>>
> >>>> -----Original Message----- From: Alexandre Rafalovitch
> >>>> Sent: Sunday, June 8, 2014 1:40 AM
> >>>> To: [email protected]
> >>>> Subject: Re: Adding Morphline support to DIH - worth the effort?
> >>>>
> >>>>
> >>>> Well, it's the same core scenario as DIH supports (apart from actual
> >>>> data sources), but actively supported and developed by a company with
> >>>> a lot more investment in it. For the primitives supported, see
> >>>>
> >>>>
> http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html
> >>>>
> >>>> We don't bundle ALL of these with Solr, but I think we do bundle core,
> >>>> solr-core and solr-cell packages, which is a good number and range of
> >>>> functionality (e.g. readMultiLine).
> >>>>
> >>>> Regards,
> >>>>   Alex.
> >>>> Personal website: http://www.outerthoughts.com/
> >>>> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >>>> proficiency
> >>>>
> >>>>
> >>>> On Sun, Jun 8, 2014 at 12:23 PM, Shalin Shekhar Mangar
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>>
> >>>>> I do not know much about morphlines but I'd like to know what
> use-cases
> >>>>> would be possible/easier/faster with such an integration?
> >>>>>
> >>>>>
> >>>>> On Sun, Jun 8, 2014 at 10:32 AM, Alexandre Rafalovitch
> >>>>> <[email protected]>
> >>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> I had a preliminary look around and it might be possible to plug
> >>>>>> Morphline (already shipped with Solr) into DIH by creating a
> bridging
> >>>>>> EntityProcessor.
> >>>>>>
> >>>>>> Two questions:
> >>>>>> 1) Do people see value in it?
> >>>>>> 2) DIH is not very supported, so any addition seems to be a bit
> stuck
> >>>>>> in "rickety bridge, don't rock" discussion (e.g. SOLR-4799). I don't
> >>>>>> want to suddenly be responsible for fixing the bridge before adding
> a
> >>>>>> standalone piece of code. So, if I write the code, how many general
> >>>>>> DIH externalities would I also have to address (e.g. lack of tests,
> >>>>>> etc)?
> >>>>>>
> >>>>>> Regards,
> >>>>>>    Alex.
> >>>>>> P.s. Morphline could also be integrated in update request processor
> >>>>>> chain. So, that could be an alternative project.
> >>>>>>
> >>>>>> Personal website: http://www.outerthoughts.com/
> >>>>>> Current project: http://www.solr-start.com/ - Accelerating your
> Solr
> >>>>>> proficiency
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>> For additional commands, e-mail: [email protected]
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Regards,
> >>>>> Shalin Shekhar Mangar.
> >>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <[email protected]>

Reply via email to