The most common usecase for DIH is RDBMS-> Solr . If we can have a simple way to achieve this using Morphlines, there is no reason why we cant move to that completely. Ideally DIH is an ETL tool and having that as a part of Solr is not a viable long term solution
On Thu, Jun 12, 2014 at 5:51 PM, Eric Pugh <[email protected]> wrote: > We’ve been using Apache Camel. It doesn’t scale like Flume would, but it > does have lots of nice orchestration. It’s kind of between DIH and Flume, > and while not focused on Solr like some of the other pipelines out there, > has a lot of general purpose features that can be useful. > > On Jun 12, 2014, at 12:48 AM, [email protected] wrote: > > LOL I had the very same reaction Alexandre. Most of us don’t have all > this big data software sitting around, even if it is free. Complexity. > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley > > > On Thu, Jun 12, 2014 at 12:44 AM, Alexandre Rafalovitch < > [email protected]> wrote: > >> On Thu, Jun 12, 2014 at 7:43 AM, Wolfgang Hoschek <[email protected]> >> wrote: >> > On Hadoop, even the JDBC/SQL portion of DIH now seems mostly covered by >> a combination of Sqoop and MapReduceIndexerTool, and perhaps a bit of Hive. >> >> I appreciate that if you are in the Big Data space, you already have >> most of these pieces and the installation space is not a concern >> either. >> >> But for the others, the statement above is probably why DIH is still >> around. It's an easy way to cover those essential "read from >> database", "partial update from database" scenario. If one has to >> setup Sqoop+Hive+other bits to get it, it's probably too much to ask >> and might be too heavy to install. Certainly when they are starting >> with Solr. >> >> The question to me is: what is the _minimum_ set of technologies >> needed to be brought together to replace what DIH provides now. And >> what very Solr-specific gaps it leaves (includes progress indicator, >> SolrCloud, etc). And what's the space/complexity trade-off. Then, >> there is the rest of the questions. Such as: "Which tool/framework has >> the strongest overlapping community with Solr, so that everybody would >> benefit from adopting their platform". >> >> I think Morphline covers most, possibly all of the Entity Processors >> and Transformers in DIH. And maybe XML/File data sources too. But SQL >> data source is the main issue here. I can't tell whether Flume covers >> the DataSources scenario for SQL and makes it worth the upgrade. >> >> Regards, >> Alex. >> >> Personal website: http://www.outerthoughts.com/ >> Current project: http://www.solr-start.com/ - Accelerating your Solr >> proficiency >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > ----------------------------------------------------- > *Eric Pugh **| *Principal | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com | My Free/Busy > <http://tinyurl.com/eric-cal> > Co-Author: *Apache Solr 3 Enterprise Search Server* > <http://www.packtpub.com/apache-solr-3-enterprise-search-server/book> > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. > > > > > > > > > > > > > > > -- ----------------------------------------------------- Noble Paul
