Hello Derek, See answers inline.
-- Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 On Jun 9, 2014, at 12:00 AM, Derek Poh <d...@globalsources.com> wrote: > My company is actively looking at alternative search engine applications to > replace our current Endeca application. > > I have no experience and knowledge on Solr and Lucene. > Please bear with me, I would like to find out if the following features are > available on Solr. > > 1. Aggregate results (rollups). > Eg. Froma list of search result of products (each has field = supplier id), > can the results be aggregated by supplier id with the original results > ordering retain. Yes it can: http://wiki.apache.org/solr/FieldCollapsing > 2. Filter/Navigator, counts. > List out a field's possible values and their counts fromthe indexed data and > from the return results. > The field's values can be sorted by the values description or by the values > countsin the return results. Yes, Solr calls these "Facets" and offers several types: http://wiki.apache.org/solr/SimpleFacetParameters http://wiki.apache.org/solr/HierarchicalFaceting > Eg. Field "Business Type" belowwith it's possible values and the count for > each value(in bracket). Can the field be return in the result with it's > values sorted either by description or bycounts? > Business Type > Manufacturer (15269) > Exporter (12493) > Trading Company (5541) > Agent (1324) > Wholesaler (1202) > Importer (682) > Buying Office (394) > Distributor (278) > Other (157) > Retailer (116) > Consultant (54) Absolutely, and Solr is very fast and accurate. > 3. Configureand defined the relevance rankingand matching logic of the return > result. Yes, though not by that name. Step 1: Configure default edismax parameters in your solrconfig.xml Step 2: Create additional search handlers in solrconfig.xml, and each search handler can have its own edismax configuration. Normally the format of the search URL is: http://localhost:8983/solr/collection_name/select?q=text:budget You would replace the "select" with the name of the search handler that has the edismax config you want. With multiple search handlers, you'd use something like: http://localhost:8983/solr/collection_name/search_freshest?q=text:budget http://localhost:8983/solr/collection_name/search_most_popular?q=text:budget > 4. Defined and configure the thesaurus (1-wayor 2-way), stemming and stop > words. Yes, Solr is very good about this, you have both options. Also, Solr let's you choose: * Index time, or query time, or both * Use expansion or reduction You can even have more than one thesaurus file and have them each handled differently. For example: * Use an english_language thesaurus, which rarely changes, and expand that at index time * Use your company_synonyms, which may change frequently, and expand them at search time. I'll let you find these in the wiki, http://wiki.apache.org > > 5. Multi-language supportfor Simplified Chinese and Spanish. Yes! And for simplified Chinese, please make sure to use the SmartCN analyzer, and not the simplistic "CJK"; SmartCN actually looks for Chinese language word breaks using statistical methods, and therefore should give better results. > > 6. Scalability. > At present, we are indexing 4million recordsand the number is expected to > increase by more than 10 folds in the near future. 40 million documents can normally be handled on a single machine, assuming it has enough RAM and doesn't have a lot of other stuff running. You might want a second machine for failover. When people use multiple machines, then the way to do that is via SolrCloud. > 7. Search results debugging. Eg. why record was matchedor why record was > ranked as such. Yes. You typically add &debugQuery=true&debug.explain.structured=true to the URL. The output is a bit technical, it takes some practice to understand. There's also a graphical relevancy debugger with a free eval period: http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/ > > Derek > > ---------------------- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons.