[Dspace-tech] Nutch crawl and Hadoop

Ashish Kulkarni Sun, 29 Sep 2013 23:27:53 -0700

We are building an intranet search system and would like to leverage the
DSpace capabilities. We have a crawled corpus and several custom built
applications for bookmarking, annotation, search etc. We would like all
these to work together seamlessly on the crawled corpus using DSpace as the
framework. With that context, had the following queries -
1.  How do we integrate DSpace with Nutch (crawler) in a manner that it
continuously crawls the intranet, ingests documents (web pages, pdfs, .doc,
.ppt etc.) and makes them available for browse and search? We also have
several custom applications - bookmarking, annotation, search etc. that we
would like to host over DSpace.


2.  Can DSpace be installed in a Hadoop cluster? Any pointers?

Regards,
Ashish

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk

_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

[Dspace-tech] Nutch crawl and Hadoop

Reply via email to