I see DIH as something that offers a quick way to get things done, as long as they fit into DIH's couple of basic scenarios. Going even a little beyond hits bugs, bad documentation, inconsistencies and lack of ongoing support (e.g. SOLR-4383).
So, if it works for you - great. If it does not - too bad, use SolrJ. And given what I observe, I believe the next round of improvements might be easier to achieve by moving to a different open-source pipe project than trying to keep reinventing and bandaging one of our own. Go where strongest community is, etc. Morphline can be seen as a replacement for DIH's EntityProcessors and Transformers (Flume adds other bits). The reasons I think it is worth looking at are as follows: 1) DIH is not really being maintained or further improved. So, the list of EP and Transformers is the same and does not account for new requests (which we see periodically on the mailing list); even the new implementations get stuck in JIRA (see the JIRA in original email) 2) It's not terribly well documented either, so people are always struggling to understand how the entity is actually generated and what happens when things go wrong 3) We are already bundling Morphline jars with Solr. But we are NOT using them in any way useful to a non-Hadoop Solr user. Which begs the question why did we add them (one answer I guess: because we don't have module system). 4) Morphlines have more primitives than DIH and the available list keeps growing 5) What separate module for Solr? We have no discovery method for modules. Writing one for general consumption is like trying to sing in vacuum - the problem is a lot bigger that with individual offering. In terms of implementation, I think it take defining a custom MorphlineEntityProcessor which basically plugs into DIH's current DataSources. So, one could use for example DIH SqlDataSource to get a list of files and then to handoff to Morphline's black box to parse those files into records (e.g. Multiline records), augment them, etc. Then, at the end, this gets handed back to DIH to finish it up. I think this would work even with nested entities and transformers. The Admin UI should also work Eventually, I think we need a harder discussion about DIH, so this partial handover could be a way to test the waters. Does this make more sense? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Sun, Jun 8, 2014 at 8:41 PM, Jack Krupansky <[email protected]> wrote: > It sounds more like an alternative to DIH rather than an incremental add-on > to DIH. I mean, isn't Morphline really just "a DIH for Hadoop"? > > So, back to Shalin's question, which specific (please detail!) use cases of > DIH are enhanced by Morphline? > > Maybe it would help if you simply elaborate what benefits would accrue to > adding Morphline to DIH - as opposed to creating a separate module for Solr. > I suppose it depends on whether you consider DIH a solid foundation or a > weak link in Solr that desperately needs firming up. > > -- Jack Krupansky > > -----Original Message----- From: Alexandre Rafalovitch > Sent: Sunday, June 8, 2014 1:40 AM > To: [email protected] > Subject: Re: Adding Morphline support to DIH - worth the effort? > > > Well, it's the same core scenario as DIH supports (apart from actual > data sources), but actively supported and developed by a company with > a lot more investment in it. For the primitives supported, see > http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html > > We don't bundle ALL of these with Solr, but I think we do bundle core, > solr-core and solr-cell packages, which is a good number and range of > functionality (e.g. readMultiLine). > > Regards, > Alex. > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > > On Sun, Jun 8, 2014 at 12:23 PM, Shalin Shekhar Mangar > <[email protected]> wrote: >> >> I do not know much about morphlines but I'd like to know what use-cases >> would be possible/easier/faster with such an integration? >> >> >> On Sun, Jun 8, 2014 at 10:32 AM, Alexandre Rafalovitch >> <[email protected]> >> wrote: >>> >>> >>> Hello, >>> >>> I had a preliminary look around and it might be possible to plug >>> Morphline (already shipped with Solr) into DIH by creating a bridging >>> EntityProcessor. >>> >>> Two questions: >>> 1) Do people see value in it? >>> 2) DIH is not very supported, so any addition seems to be a bit stuck >>> in "rickety bridge, don't rock" discussion (e.g. SOLR-4799). I don't >>> want to suddenly be responsible for fixing the bridge before adding a >>> standalone piece of code. So, if I write the code, how many general >>> DIH externalities would I also have to address (e.g. lack of tests, >>> etc)? >>> >>> Regards, >>> Alex. >>> P.s. Morphline could also be integrated in update request processor >>> chain. So, that could be an alternative project. >>> >>> Personal website: http://www.outerthoughts.com/ >>> Current project: http://www.solr-start.com/ - Accelerating your Solr >>> proficiency >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
