Hi...
  I am a Final Year Undergrad.My Final year project is about search  engine for 
 XML Document..I am currently building this system  using Lucene.
  
  The example of XML element from an XML document :
  ----------------------------------------------
  <article>
      <body>
          <section>
          This is my first text
          </section>
         <p>
          This is my second text
        </p>
      </body>
  </article>
  ----------------------------------------------
  After the XML Document is parsed,I will get
  
  tag       : article
    value    : 
  
  tag       : article/body
    value    : 
  
  tag       : article/body/section
  value    : This is my first text
  
    tag       : article/body/p
    value    : This is my second text
  
  
  Constructing the Lucene Index, I treat : 
  1.the XML tag as the field
  2.the value of it as the terms to be indexed
  
  
  Before implementing this search engine,I have designed to build the index in 
such a way that every XML tag is converted using binary value,in order to 
reduce the size index and perhaps for faster searching.To illustrate:
  
  article will be converted to 0
  article/body will be converted to 0.0
  article/body/section will be converted to 0.0.0
    article/body/p will be converted to 0.0.1
  
  Now,because of using lucene for the implementation,i wonder wheter such 
conversion will still be useful for efficiency..I  wonder wheter inside the 
lucene index itself, such kind of conversion  or perhaps even further 
optimization is already done in order to reduce  the size index or for  faster 
searching.
  
  Can anyone give me some information?
  
  Really need help...Thanks a lot
  
  
  Regards, 
  
  Maureen
  
  
  
  
 
---------------------------------
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.

Reply via email to