Hi,
Our application stores GBs of data in Lucene Solr index. It reads from Solr index and does some processing on the data and stores it back in Solr as index. It is stored in Solr index so that faceted search is possible. The process of reading from solr, processing data and writing back to index is very slow. So we are looking at some parallel programming frameworks. Hadoop MapReduce seems to take input in form of file and creates output as a file. Since we have data in Solr index, should we read data from index convert to a file and send it as input to Hadoop and read its output file and write the results to index? This read and write to index will still be time consuming if not run parallel. Or should we get rid of Solr index and just store data in HDFS. Also the index is stored in one folder which means one disk. We donot use multiple disks. Is use of multiple disk a must for Hadoop? I am new to Hadoop and trying to figure out whether Hadoop is the solution for our application. Thanks SR
