Hello Derek,

See answers inline.

--
Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Jun 9, 2014, at 12:00 AM, Derek Poh <d...@globalsources.com> wrote:

> My company is actively looking at alternative search engine applications to 
> replace our current Endeca application.
> 
> I have no experience and knowledge on Solr and Lucene.
> Please bear with me, I would like to find out if the following features are 
> available on Solr.
> 
> 1. Aggregate results (rollups).
> Eg. Froma list of search result of products (each has field = supplier id), 
> can the results be aggregated by supplier id with the original results 
> ordering retain.
Yes it can:
http://wiki.apache.org/solr/FieldCollapsing

> 2. Filter/Navigator, counts.
> List out a field's possible values and their counts fromthe indexed data and 
> from the return results.
> The field's values can be sorted by the values description or by the values 
> countsin the return results.
Yes, Solr calls these "Facets" and offers several types:
http://wiki.apache.org/solr/SimpleFacetParameters
http://wiki.apache.org/solr/HierarchicalFaceting

> Eg. Field "Business Type" belowwith it's possible values and the count for 
> each value(in bracket). Can the field be return in the result with it's 
> values sorted either by description or bycounts?
> Business Type
> Manufacturer (15269)
>    Exporter (12493)
>    Trading Company (5541)
>    Agent (1324)
>    Wholesaler (1202)
>    Importer (682)
>    Buying Office (394)
> Distributor (278)
>    Other (157)
>    Retailer (116)
>    Consultant (54)

Absolutely, and Solr is very fast and accurate.

> 3. Configureand defined the relevance rankingand matching logic of the return 
> result.
Yes, though not by that name.
Step 1:
Configure default edismax parameters in your solrconfig.xml

Step 2:
Create additional search handlers in solrconfig.xml, and each search handler 
can have its own edismax configuration.

Normally the format of the search URL is:
    http://localhost:8983/solr/collection_name/select?q=text:budget

You would replace the "select" with the name of the search handler that has the 
edismax config you want.

With multiple search handlers, you'd use something like:
    http://localhost:8983/solr/collection_name/search_freshest?q=text:budget
    http://localhost:8983/solr/collection_name/search_most_popular?q=text:budget

> 4. Defined and configure the thesaurus (1-wayor 2-way), stemming and stop 
> words.
Yes, Solr is very good about this, you have both options.

Also, Solr let's you choose:
* Index time, or query time, or both
* Use expansion or reduction

You can even have more than one thesaurus file and have them each handled 
differently.

For example:
* Use an english_language thesaurus, which rarely changes, and expand that at 
index time
* Use your company_synonyms, which may change frequently, and expand them at 
search time.

I'll let you find these in the wiki, http://wiki.apache.org

> 
> 5. Multi-language supportfor Simplified Chinese and Spanish.
Yes!

And for simplified Chinese, please make sure to use the SmartCN analyzer, and 
not the simplistic "CJK"; SmartCN actually looks for Chinese language word 
breaks using statistical methods, and therefore should give better results.

> 
> 6. Scalability.
> At present, we are indexing 4million recordsand the number is expected to 
> increase by more than 10 folds in the near future.
40 million documents can normally be handled on a single machine, assuming it 
has enough RAM and doesn't have a lot of other stuff running.
You might want a second machine for failover.

When people use multiple machines, then the way to do that is via SolrCloud.

> 7. Search results debugging. Eg. why record was matchedor why record was 
> ranked as such.
Yes.

You typically add &debugQuery=true&debug.explain.structured=true to the URL.

The output is a bit technical, it takes some practice to understand.

There's also a graphical relevancy debugger with a free eval period:
http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/

> 
> Derek
> 
> ----------------------
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.

Reply via email to