Hm, yes. See how few hits this shows:
http://search-hadoop.com/?q=non-distributed&fc_project=Hadoop You can set it up on 1 box, but that's really useful only for development. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch ----- Original Message ---- > From: "Ranganathan, Sharmila" <[email protected]> > To: [email protected] > Sent: Wed, January 20, 2010 3:23:34 PM > Subject: RE: Data currently stored in Solr index. Should it be moved to HDFS? > > Thanks for your reply. Is Hadoop only for distributed applications? > > > -----Original Message----- > From: Otis Gospodnetic [mailto:[email protected]] > Sent: Wednesday, January 20, 2010 2:03 PM > To: [email protected] > Subject: Re: Data currently stored in Solr index. Should it be moved to > HDFS? > > Hello, > > Reading large result sets from Solr is not the way we typically advise > people to use Solr. It's not designed for that (nor is Lucene, the > search library at its core). There is some work being done right now > about getting Solr better at retrieveing large result sets, but my > feeling is you'd be better of avoiding Solr and getting data to your MR > jobs from files stored in HDFS. > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > ----- Original Message ---- > > From: "Ranganathan, Sharmila" > > To: [email protected] > > Sent: Tue, January 19, 2010 5:15:36 PM > > Subject: Data currently stored in Solr index. Should it be moved to > HDFS? > > > > Hi, > > > > > > > > Our application stores GBs of data in Lucene Solr index. It reads from > > Solr index and does some processing on the data and stores it back in > > Solr as index. It is stored in Solr index so that faceted search is > > possible. The process of reading from solr, processing data and > writing > > back to index is very slow. So we are looking at some parallel > > programming frameworks. Hadoop MapReduce seems to take input in form > of > > file and creates output as a file. Since we have data in Solr index, > > should we read data from index convert to a file and send it as input > to > > Hadoop and read its output file and write the results to index? This > > read and write to index will still be time consuming if not run > > parallel. Or should we get rid of Solr index and just store data in > > HDFS. Also the index is stored in one folder which means one disk. > We > > donot use multiple disks. Is use of multiple disk a must for Hadoop? > > > > > > > > I am new to Hadoop and trying to figure out whether Hadoop is the > > solution for our application. > > > > > > > > Thanks > > > > SR
