Just to add since you're likely on a windows platform, check out Ifilters and how to use them- they are probably the easiest way you have to extract data from pdf/html/xml. Check out this for getting started with using the Ifilter interface: http://www.codeproject.com/KB/cs/IFilter.aspx?msg=2428047 Once you extract the plain text - that is where Lucene comes in to parse that plain text and create an index. ~P
> From: [email protected] > Date: Wed, 2 Feb 2011 12:09:01 +1100 > Subject: Re: Question > To: [email protected] > > Lucene.Net uses the same binary data store that Lucene uses which is stored > on the file system (generally, it depends on what Directory instance you > provide to the indexer & searcher). > > Some projects, such as NHibernate.Search and RavenDB use Lucene.Net > internally and handle syncronizing the data stores (DB & Lucene). > If you're trying to index things such as HTML/ XML/ PDF/ etc you have to > write your own way to read the data into Lucene though. > Aaron Powell > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team > Member <http://funnelweblog.com> > > http://www.aaron-powell.com | http://twitter.com/slace | Skype: > aaron.l.powell | MSN: [email protected] > > > On Wed, Feb 2, 2011 at 12:03 PM, Lucas E Wall <[email protected]> wrote: > > > > > Thanks, Aaron. I went through your blog and it makes a lot of sense. > > Given that Lucen is asp friendly, can I call the library from mssql? Where > > does the indexing gets stored? Do I need to provide a database for do files > > I need indexed, and for the index as well? May be my questions are a little > > bit too entry level. > > > > > From: [email protected] > > > Date: Wed, 2 Feb 2011 11:04:45 +1100 > > > Subject: Re: Question > > > To: [email protected] > > > > > > You don't actually install Lucene.Net, it's just a library which you > > > reference into your application. Solr is an installable Lucene service, > > > which essentially provides RESTful endpoints to Lucene (java), or so goes > > my > > > understanding. > > > > > > With regards to what you can search with Lucene, well that really comes > > down > > > to anything you can push into the index. Keep in mind that Lucene is just > > a > > > indexer and searcher, it's not a crawler or anything. You have to push > > the > > > data to the indexer, and you have to write queries to get it back out. > > > I've got some blogs on my site about getting started with Lucene.Net - > > > http://www.aaron-powell.com/lucene-net-overview > > > Aaron Powell > > > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team > > > Member <http://funnelweblog.com> > > > > > > http://www.aaron-powell.com | http://twitter.com/slace | Skype: > > > aaron.l.powell | MSN: [email protected] > > > > > > > > > On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <[email protected]> > > wrote: > > > > > > > > > > > I am new to Lucene and have the following questions. What is the best > > way > > > > to understand what is required to install Lucene in a server? Also, > > can i > > > > make Lucene run searches on links to xml data on the web?Thanks > > > > > > > >
