Hi Ashish, this sounds like a strange use case for DSpace. I must ask what do you expect from DSpace to provide in this scenario? Because you could probably do just fine with Nutch + Solr and a really thin web interface on top - perhaps Solritas, ajax-solr (check them out!) or even something developed from scratch would be easier than this integration.
It's true that DSpace uses Solr for its Discovery functionality, but it has a specific schema for the index. I can only guess that Nutch will also have its own specific schema requirements, so there may be significant work involved in getting them to work together. And then I don't really see what DSpace brings into the game that Solr doesn't. About DSpace on Hadoop, I personally haven't heard of such deployment, but that doesn't mean DSpace cannot run over multiple nodes. Depending on your workload, you could distribute Postgres, Solr (for search) or a reverse proxy (for bitstream caching, probably not your use case). The DSpace application itself is not really distributable apart from running the separate webapps on different nodes. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

