On Thu, Jun 12, 2014 at 7:43 AM, Wolfgang Hoschek <[email protected]> wrote: > On Hadoop, even the JDBC/SQL portion of DIH now seems mostly covered by a > combination of Sqoop and MapReduceIndexerTool, and perhaps a bit of Hive.
I appreciate that if you are in the Big Data space, you already have most of these pieces and the installation space is not a concern either. But for the others, the statement above is probably why DIH is still around. It's an easy way to cover those essential "read from database", "partial update from database" scenario. If one has to setup Sqoop+Hive+other bits to get it, it's probably too much to ask and might be too heavy to install. Certainly when they are starting with Solr. The question to me is: what is the _minimum_ set of technologies needed to be brought together to replace what DIH provides now. And what very Solr-specific gaps it leaves (includes progress indicator, SolrCloud, etc). And what's the space/complexity trade-off. Then, there is the rest of the questions. Such as: "Which tool/framework has the strongest overlapping community with Solr, so that everybody would benefit from adopting their platform". I think Morphline covers most, possibly all of the Entity Processors and Transformers in DIH. And maybe XML/File data sources too. But SQL data source is the main issue here. I can't tell whether Flume covers the DataSources scenario for SQL and makes it worth the upgrade. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
