Hi,
   
  We're interested in using Hadoop for our application for purposes of 
replication and distribution of query execution.  But I have some questions as 
to whether it's a good fit.  We have essentially written a search engine using 
Jena (Semantic Web framework) and its accompanying Lucene interface called LARQ 
(Lucene ARQ) to allow for free-text search over the RDF graphs stored in Jena. 

We expect the Lucene indexes to get very large, thus the need for Hadoop.  I 
tried going through the documentation provided on the site, but want to clarify 
some points that we are unable to answer from the wiki, faq, etc: 

1.  We're not using Nutch, but the documentation seems to reference it 
frequently.  Is this a problem?  Can Lucene indexes alone be used with Hadoop 
without using Nutch?

2.  Are there any best practices to using Hadoop behind such a setup in terms 
of creating/querying/managing the Lucene indexes?  I found this thread ( 
http://www.mail-archive.com/[email protected]/msg00573.html ), but 
could use some clarification on several of the points mentioned. 
   
  3. How does Hadoop access, process & replicate the Lucene indexes in case we 
generate the indexes in our local file system as against HDFS?
   
  4. Please provide a standard flow of execution as to how Hadoop works when 
Lucene is queried.
   
   
  Thanks,
  Vinaya 

       
---------------------------------
 Check out what you're missing if you're not on Yahoo! Messenger 
  • New to Hadoop Vinaya Shastrakar
    • New to Hadoop Vinaya Shastrakar

Reply via email to