You probably want to look at this page:

https://wiki.apache.org/solr/Solrj

If you get about half way down it starts explaining how to index.  I'm
assuming if you use Java you can get the server set up and running (I don't
use Java and was able to get it up and running under Tomcat in a matter of
a few hours).

Basically, you have to define your schema as an xml document.  Each field
that you want to store must be defined along with a data type, whether to
store the original value, whether to index it, whether it is multi-valued,
etc.  See here:

https://wiki.apache.org/solr/SchemaXml

As they mention there you usually add an "id" field that is unique.  For
you that might be the url or some part of the url that makes it unique.
You can add text, numbers, etc. and it'll all be indexed.  But note that
you have to define up front what you'll be storing.  You can use
"dynamicField" to give you great flexibility in storing data without having
to define the exact fields up front.

After that, you can see on the first wiki link above how to create
documents and add them to the index.  You'll also want to create something
to properly keep your index updated as your data changes, is added, or
removed.

On the search side you just create queries and run them.  The query will
return the data that you want, probably just the id but you might also want
it to do highlighting for text fields that you've stored.  You can look
through the documentation to determine how to set field weights but it's
not difficult.  You can also do faceting if that's helpful.

There are also various other plugins that you can throw in to the mix.  One
that is on my backlog to use is the autocompleter.  You also have to
determine which stemming to use although I think it has one by default that
works well.

I also recommend this page for more info:

https://wiki.apache.org/solr/FrontPage#Tips.2C_Tricks_and_Use_Cases

This is a really complete search engine that will do anything you need for
searching.  It takes a few days to learn everything about it that you need
for basic indexing but it's all well-documented and pretty straight-forward.

One other thing to note is that if you're going to index more than one data
set (or think that you might in the future) it probably makes sense to
create a multi-core setup right off the bat.

That should be enough to keep you busy for awhile.

Michael

On Mon, Jul 20, 2015 at 10:48 AM, Andrew Farnsworth <[email protected]>
wrote:

> We are planning on using it for the internal websites we create and
> providing detailed search into the content but also weighting the search
> results depending on where you are on the site.  So the content we would be
> indexing would use URL as the key (not sure of SOLR terminology for this)
> and it would index the content of the page.  Since we are generating the
> page we can just feed the search engine the content rather than crawling
> the pages.  I would rather not index the web pages html, css, and
> javascript though, which is another reason I want to feed it rather than
> just spidering the website and indexing that.
>
> Andy
>
> On Mon, Jul 20, 2015 at 9:11 AM, Michael Chaney <
> [email protected]> wrote:
>
>> My search shows "SolrJ" as a Java client for accessing solr.  What kind
>> of data are you trying to index?
>>
>> To give you an idea, I use Solr for doing music catalog searches.  I'm
>> using Ruby and there's a Ruby gem that allows pretty easy indexing of data
>> from a Rails app.  I get to describe the data that I want to be indexed and
>> searching is really simple.  Even if you're going directly to the server
>> it's not difficult to search.
>>
>> I was using thinking sphinx (based on the sphinx search engine) but solr
>> is much faster in indexing.
>>
>> Michael
>>
>> On Sat, Jul 18, 2015 at 10:35 AM, Andrew Farnsworth <[email protected]>
>> wrote:
>>
>>> Java is the language of choice at the moment though I would be
>>> interested in perl too.
>>>
>>> Andrew Farnsworth
>>> (804) 405-3630
>>>
>>> On Jul 18, 2015, at 9:36 AM, Michael Chaney <[email protected]>
>>> wrote:
>>>
>>> What language are you using?  Typically there'll be a library to make it
>>> easy to interface.
>>>
>>> On Friday, July 17, 2015, Andrew Farnsworth <[email protected]> wrote:
>>>
>>>> Does anyone have any experience with Apache SOLR?  I'm trying to get my
>>>> own content into it and am not sure how it works.
>>>>
>>>> Andy Farnsworth
>>>>
>>>> --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "NLUG" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/nlug-talk?hl=en
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "NLUG" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> --
>>> Michael Darrin Chaney, Sr.
>>> [email protected]
>>> http://www.michaelchaney.com/
>>>
>>> --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "NLUG" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/nlug-talk?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "NLUG" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "NLUG" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/nlug-talk?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "NLUG" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Michael Darrin Chaney, Sr.
>> [email protected]
>> http://www.michaelchaney.com/
>>
>> --
>> --
>> You received this message because you are subscribed to the Google Groups
>> "NLUG" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>>
>> For more options, visit this group at
>> http://groups.google.com/group/nlug-talk?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "NLUG" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> --
> You received this message because you are subscribed to the Google Groups
> "NLUG" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nlug-talk?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "NLUG" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Michael Darrin Chaney, Sr.
[email protected]
http://www.michaelchaney.com/

-- 
-- 
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/nlug-talk?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to