Just to add since you're likely on a windows platform, check out Ifilters and 
how to use them- they are probably the easiest way you have to extract data 
from pdf/html/xml.
 
Check out this for getting started with using the Ifilter interface: 
http://www.codeproject.com/KB/cs/IFilter.aspx?msg=2428047
 
Once you extract the plain text - that is where Lucene comes in to parse that 
plain text and create an index.
 
~P




> From: [email protected]
> Date: Wed, 2 Feb 2011 12:09:01 +1100
> Subject: Re: Question
> To: [email protected]
> 
> Lucene.Net uses the same binary data store that Lucene uses which is stored
> on the file system (generally, it depends on what Directory instance you
> provide to the indexer & searcher).
> 
> Some projects, such as NHibernate.Search and RavenDB use Lucene.Net
> internally and handle syncronizing the data stores (DB & Lucene).
> If you're trying to index things such as HTML/ XML/ PDF/ etc you have to
> write your own way to read the data into Lucene though.
> Aaron Powell
> Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> Member <http://funnelweblog.com>
> 
> http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> aaron.l.powell | MSN: [email protected]
> 
> 
> On Wed, Feb 2, 2011 at 12:03 PM, Lucas E Wall <[email protected]> wrote:
> 
> >
> > Thanks, Aaron. I went through your blog and it makes a lot of sense.
> > Given that Lucen is asp friendly, can I call the library from mssql? Where
> > does the indexing gets stored? Do I need to provide a database for do files
> > I need indexed, and for the index as well? May be my questions are a little
> > bit too entry level.
> >
> > > From: [email protected]
> > > Date: Wed, 2 Feb 2011 11:04:45 +1100
> > > Subject: Re: Question
> > > To: [email protected]
> > >
> > > You don't actually install Lucene.Net, it's just a library which you
> > > reference into your application. Solr is an installable Lucene service,
> > > which essentially provides RESTful endpoints to Lucene (java), or so goes
> > my
> > > understanding.
> > >
> > > With regards to what you can search with Lucene, well that really comes
> > down
> > > to anything you can push into the index. Keep in mind that Lucene is just
> > a
> > > indexer and searcher, it's not a crawler or anything. You have to push
> > the
> > > data to the indexer, and you have to write queries to get it back out.
> > > I've got some blogs on my site about getting started with Lucene.Net -
> > > http://www.aaron-powell.com/lucene-net-overview
> > > Aaron Powell
> > > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> > > Member <http://funnelweblog.com>
> > >
> > > http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> > > aaron.l.powell | MSN: [email protected]
> > >
> > >
> > > On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <[email protected]>
> > wrote:
> > >
> > > >
> > > > I am new to Lucene and have the following questions. What is the best
> > way
> > > > to understand what is required to install Lucene in a server? Also,
> > can i
> > > > make Lucene run searches on links to xml data on the web?Thanks
> > > >
> >
> >                                       

Reply via email to