Re: Adding Morphline support to DIH - worth the effort?

Noble Paul Mon, 16 Jun 2014 05:49:30 -0700

The most common usecase for DIH is RDBMS-> Solr . If we can have a simple
way to achieve this using Morphlines, there is no reason why we cant move
to that completely. Ideally DIH is an ETL tool and having that as a part of
Solr is not a viable long term solution



On Thu, Jun 12, 2014 at 5:51 PM, Eric Pugh <[email protected]>
wrote:

> We’ve been using Apache Camel.  It doesn’t scale like Flume would, but it
> does have lots of nice orchestration.  It’s kind of between DIH and Flume,
> and while not focused on Solr like some of the other pipelines out there,
> has a lot of general purpose features that can be useful.
>
> On Jun 12, 2014, at 12:48 AM, [email protected] wrote:
>
> LOL I had the very same reaction Alexandre.  Most of us don’t have all
> this big data software sitting around, even if it is free.  Complexity.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Jun 12, 2014 at 12:44 AM, Alexandre Rafalovitch <
> [email protected]> wrote:
>
>> On Thu, Jun 12, 2014 at 7:43 AM, Wolfgang Hoschek <[email protected]>
>> wrote:
>> > On Hadoop, even the JDBC/SQL portion of DIH now seems mostly covered by
>> a combination of Sqoop and MapReduceIndexerTool, and perhaps a bit of Hive.
>>
>> I appreciate that if you are in the Big Data space, you already have
>> most of these pieces and the installation space is not a concern
>> either.
>>
>> But for the others, the statement above is probably why DIH is still
>> around. It's an easy way to cover those essential "read from
>> database", "partial update from database" scenario. If one has to
>> setup Sqoop+Hive+other bits to get it, it's probably too much to ask
>> and might be too heavy to install. Certainly when they are starting
>> with Solr.
>>
>> The question to me is: what is the _minimum_ set of technologies
>> needed to be brought together to replace what DIH provides now. And
>> what very Solr-specific gaps it leaves (includes progress indicator,
>> SolrCloud, etc). And what's the space/complexity trade-off. Then,
>> there is the rest of the questions. Such as: "Which tool/framework has
>> the strongest overlapping community with Solr, so that everybody would
>> benefit from adopting their platform".
>>
>> I think Morphline covers most, possibly all of the Entity Processors
>> and Transformers in DIH. And maybe XML/File data sources too. But SQL
>> data source is the main issue here. I can't tell whether Flume covers
>> the DataSources scenario for SQL and makes it worth the upgrade.
>>
>> Regards,
>>    Alex.
>>
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
>       -----------------------------------------------------
> *Eric Pugh **| *Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: *Apache Solr 3 Enterprise Search Server*
> <http://www.packtpub.com/apache-solr-3-enterprise-search-server/book>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


-- 
-----------------------------------------------------
Noble Paul

Re: Adding Morphline support to DIH - worth the effort?

Reply via email to