Thanks for your reply. Is Hadoop only for distributed applications?
-----Original Message----- From: Otis Gospodnetic [mailto:[email protected]] Sent: Wednesday, January 20, 2010 2:03 PM To: [email protected] Subject: Re: Data currently stored in Solr index. Should it be moved to HDFS? Hello, Reading large result sets from Solr is not the way we typically advise people to use Solr. It's not designed for that (nor is Lucene, the search library at its core). There is some work being done right now about getting Solr better at retrieveing large result sets, but my feeling is you'd be better of avoiding Solr and getting data to your MR jobs from files stored in HDFS. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch ----- Original Message ---- > From: "Ranganathan, Sharmila" <[email protected]> > To: [email protected] > Sent: Tue, January 19, 2010 5:15:36 PM > Subject: Data currently stored in Solr index. Should it be moved to HDFS? > > Hi, > > > > Our application stores GBs of data in Lucene Solr index. It reads from > Solr index and does some processing on the data and stores it back in > Solr as index. It is stored in Solr index so that faceted search is > possible. The process of reading from solr, processing data and writing > back to index is very slow. So we are looking at some parallel > programming frameworks. Hadoop MapReduce seems to take input in form of > file and creates output as a file. Since we have data in Solr index, > should we read data from index convert to a file and send it as input to > Hadoop and read its output file and write the results to index? This > read and write to index will still be time consuming if not run > parallel. Or should we get rid of Solr index and just store data in > HDFS. Also the index is stored in one folder which means one disk. We > donot use multiple disks. Is use of multiple disk a must for Hadoop? > > > > I am new to Hadoop and trying to figure out whether Hadoop is the > solution for our application. > > > > Thanks > > SR
