On Thu, Jun 12, 2014 at 7:43 AM, Wolfgang Hoschek <[email protected]> wrote:
> On Hadoop, even the JDBC/SQL portion of DIH now seems mostly covered by a 
> combination of Sqoop and MapReduceIndexerTool, and perhaps a bit of Hive.

I appreciate that if you are in the Big Data space, you already have
most of these pieces and the installation space is not a concern
either.

But for the others, the statement above is probably why DIH is still
around. It's an easy way to cover those essential "read from
database", "partial update from database" scenario. If one has to
setup Sqoop+Hive+other bits to get it, it's probably too much to ask
and might be too heavy to install. Certainly when they are starting
with Solr.

The question to me is: what is the _minimum_ set of technologies
needed to be brought together to replace what DIH provides now. And
what very Solr-specific gaps it leaves (includes progress indicator,
SolrCloud, etc). And what's the space/complexity trade-off. Then,
there is the rest of the questions. Such as: "Which tool/framework has
the strongest overlapping community with Solr, so that everybody would
benefit from adopting their platform".

I think Morphline covers most, possibly all of the Entity Processors
and Transformers in DIH. And maybe XML/File data sources too. But SQL
data source is the main issue here. I can't tell whether Flume covers
the DataSources scenario for SQL and makes it worth the upgrade.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to