Re: Does solr 4.8.1 support these features?

Mark Bennett Wed, 11 Jun 2014 07:25:03 -0700

Derek,

Yes, you have several options.


1: You can maintain the 3 separate indexes, what Solr would typically call a 
"collection"

2: You could also combine the data into one larger collection and use a field 
to filter on.

3: A third option is to keep them separate (as in 1), but if you occasionally 
want to search all 3 you can do that as well from a single search with 
collection=.  Or if using SolrCloud you can also create a collection alias.  So 
this way you can easily search just 1 collection, or all 3, by changing just 1 
parameter.

--
Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Jun 10, 2014, at 9:03 PM, Derek Poh <d...@globalsources.com> wrote:

> Mark
> 
> Looks like "edismax"support it, will read moreon it.
> 
> Onour current search application, we have a couple of indexes, each on 
> specific typesof data.
> Eg. 1 index of product data, 1 index on supplier data, 1 index on category 
> data.
> We query against eachindex for different searches (like product search or 
> supplier search).
> It is commonly refer to as application/pipeline in Endeca.
> 
> Does solr support such setup?
> 
> 
> On 6/11/2014 6:23 AM, Mark Bennett wrote:
>> Derek,
>> 
>> The "edismax" parser is pretty amazing.  If I understand your questions, I 
>> think the answer is yes.
>> 
>> When people tune relevancy sometimes they apply very strong rules, they 
>> "yell" at the engine.  But it sounds like you already have a good instinct, 
>> to "whisper" at Relevancy, at least at the start, and to think in terms of 
>> tie breakers.
>> 
>> When you specify the fields that edismax is to search, you can give each of 
>> them a different weights.  I think this will do most of what you want.
>> 
>> Whether matches are combined via addition or multiplication can be 
>> controlled with different options in edismax, although sometimes you have to 
>> do a bit of reading and experimenting.
>> 
>> Another trick that I sometimes use is to use copyField so that the same 
>> field is indexed several different ways.  Then, the indexed field with an 
>> exact match is given a weight of 1.0, vs. a "fuzzy" match (for example with 
>> synonyms / thesaurus) is given only a weight of 0.5 or 0.3
>> 
>> --
>> Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com
>> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
>> 
>> On Jun 10, 2014, at 12:26 AM, Derek Poh <d...@globalsources.com> wrote:
>> 
>>> Hi Mark
>>> 
>>> Appreciate you taking the time to reply and with references.
>>> 
>>> Regarding 3. Configure and defined the relevance ranking and matching logic 
>>> of the return result.
>>> 
>>> Can each search handler be configure to
>>> - search on a few fields
>>> - assign a numeric rank to each of the field, such that a match on a field 
>>> with the highest rank will rank the document higher in the return search 
>>> result.
>>> - the ranking of each field will also act as tie-breaker.
>>> Eg.
>>> Category = 3
>>> SPPKeyWord= 2
>>> KeySpecification= 1
>>> 
>>> Document that has match on field Category will be ranked higher in the 
>>> result than document that has match on SPPKeyWord.
>>> Document that has match only on field KeySpecification willrank the lowest 
>>> in the result.
>>> 
>>> 
>>> On 6/10/2014 12:27 AM, Mark Bennett wrote:
>>>> Hello Derek,
>>>> 
>>>> See answers inline.
>>>> 
>>>> --
>>>> Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com
>>>> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
>>>> 
>>>> On Jun 9, 2014, at 12:00 AM, Derek Poh <d...@globalsources.com> wrote:
>>>> 
>>>>> My company is actively looking at alternative search engine applications 
>>>>> to replace our current Endeca application.
>>>>> 
>>>>> I have no experience and knowledge on Solr and Lucene.
>>>>> Please bear with me, I would like to find out if the following features 
>>>>> are available on Solr.
>>>>> 
>>>>> 1. Aggregate results (rollups).
>>>>> Eg. Froma list of search result of products (each has field = supplier 
>>>>> id), can the results be aggregated by supplier id with the original 
>>>>> results ordering retain.
>>>> Yes it can:
>>>> http://wiki.apache.org/solr/FieldCollapsing
>>>> 
>>>>> 2. Filter/Navigator, counts.
>>>>> List out a field's possible values and their counts fromthe indexed data 
>>>>> and from the return results.
>>>>> The field's values can be sorted by the values description or by the 
>>>>> values countsin the return results.
>>>> Yes, Solr calls these "Facets" and offers several types:
>>>> http://wiki.apache.org/solr/SimpleFacetParameters
>>>> http://wiki.apache.org/solr/HierarchicalFaceting
>>>> 
>>>>> Eg. Field "Business Type" belowwith it's possible values and the count 
>>>>> for each value(in bracket). Can the field be return in the result with 
>>>>> it's values sorted either by description or bycounts?
>>>>> Business Type
>>>>> Manufacturer (15269)
>>>>>    Exporter (12493)
>>>>>    Trading Company (5541)
>>>>>    Agent (1324)
>>>>>    Wholesaler (1202)
>>>>>    Importer (682)
>>>>>    Buying Office (394)
>>>>> Distributor (278)
>>>>>    Other (157)
>>>>>    Retailer (116)
>>>>>    Consultant (54)
>>>> Absolutely, and Solr is very fast and accurate.
>>>> 
>>>>> 3. Configureand defined the relevance rankingand matching logic of the 
>>>>> return result.
>>>> Yes, though not by that name.
>>>> Step 1:
>>>> Configure default edismax parameters in your solrconfig.xml
>>>> 
>>>> Step 2:
>>>> Create additional search handlers in solrconfig.xml, and each search 
>>>> handler can have its own edismax configuration.
>>>> 
>>>> Normally the format of the search URL is:
>>>>     http://localhost:8983/solr/collection_name/select?q=text:budget
>>>> 
>>>> You would replace the "select" with the name of the search handler that 
>>>> has the edismax config you want.
>>>> 
>>>> With multiple search handlers, you'd use something like:
>>>>     
>>>> http://localhost:8983/solr/collection_name/search_freshest?q=text:budget
>>>>     
>>>> http://localhost:8983/solr/collection_name/search_most_popular?q=text:budget
>>>> 
>>>>> 4. Defined and configure the thesaurus (1-wayor 2-way), stemming and stop 
>>>>> words.
>>>> Yes, Solr is very good about this, you have both options.
>>>> 
>>>> Also, Solr let's you choose:
>>>> * Index time, or query time, or both
>>>> * Use expansion or reduction
>>>> 
>>>> You can even have more than one thesaurus file and have them each handled 
>>>> differently.
>>>> 
>>>> For example:
>>>> * Use an english_language thesaurus, which rarely changes, and expand that 
>>>> at index time
>>>> * Use your company_synonyms, which may change frequently, and expand them 
>>>> at search time.
>>>> 
>>>> I'll let you find these in the wiki, http://wiki.apache.org
>>>> 
>>>>> 5. Multi-language supportfor Simplified Chinese and Spanish.
>>>> Yes!
>>>> 
>>>> And for simplified Chinese, please make sure to use the SmartCN analyzer, 
>>>> and not the simplistic "CJK"; SmartCN actually looks for Chinese language 
>>>> word breaks using statistical methods, and therefore should give better 
>>>> results.
>>>> 
>>>>> 6. Scalability.
>>>>> At present, we are indexing 4million recordsand the number is expected to 
>>>>> increase by more than 10 folds in the near future.
>>>> 40 million documents can normally be handled on a single machine, assuming 
>>>> it has enough RAM and doesn't have a lot of other stuff running.
>>>> You might want a second machine for failover.
>>>> 
>>>> When people use multiple machines, then the way to do that is via 
>>>> SolrCloud.
>>>> 
>>>>> 7. Search results debugging. Eg. why record was matchedor why record was 
>>>>> ranked as such.
>>>> Yes.
>>>> 
>>>> You typically add &debugQuery=true&debug.explain.structured=true to the 
>>>> URL.
>>>> 
>>>> The output is a bit technical, it takes some practice to understand.
>>>> 
>>>> There's also a graphical relevancy debugger with a free eval period:
>>>> http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/
>>>> 
>>>>> Derek
>>>>> 
>>>>> ----------------------
>>>>> CONFIDENTIALITY NOTICE
>>>>> This e-mail (including any attachments) may contain confidential and/or 
>>>>> privileged information. If you are not the intended recipient or have 
>>>>> received this e-mail in error, please inform the sender immediately and 
>>>>> delete this e-mail (including any attachments) from your computer, and 
>>>>> you must not use, disclose to anyone else or copy this e-mail (including 
>>>>> any attachments), whether in whole or in part.
>>>>> This e-mail and any reply to it may be monitored for security, legal, 
>>>>> regulatory compliance and/or other appropriate reasons.
>>>> 
>>> 
>>> ----------------------
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or 
>>> privileged information. If you are not the intended recipient or have 
>>> received this e-mail in error, please inform the sender immediately and 
>>> delete this e-mail (including any attachments) from your computer, and you 
>>> must not use, disclose to anyone else or copy this e-mail (including any 
>>> attachments), whether in whole or in part.
>>> This e-mail and any reply to it may be monitored for security, legal, 
>>> regulatory compliance and/or other appropriate reasons.
>> 
>> 
> 
> 
> ----------------------
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.

Re: Does solr 4.8.1 support these features?

Reply via email to