Derek, Yes, you have several options.
1: You can maintain the 3 separate indexes, what Solr would typically call a "collection" 2: You could also combine the data into one larger collection and use a field to filter on. 3: A third option is to keep them separate (as in 1), but if you occasionally want to search all 3 you can do that as well from a single search with collection=. Or if using SolrCloud you can also create a collection alias. So this way you can easily search just 1 collection, or all 3, by changing just 1 parameter. -- Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 On Jun 10, 2014, at 9:03 PM, Derek Poh <d...@globalsources.com> wrote: > Mark > > Looks like "edismax"support it, will read moreon it. > > Onour current search application, we have a couple of indexes, each on > specific typesof data. > Eg. 1 index of product data, 1 index on supplier data, 1 index on category > data. > We query against eachindex for different searches (like product search or > supplier search). > It is commonly refer to as application/pipeline in Endeca. > > Does solr support such setup? > > > On 6/11/2014 6:23 AM, Mark Bennett wrote: >> Derek, >> >> The "edismax" parser is pretty amazing. If I understand your questions, I >> think the answer is yes. >> >> When people tune relevancy sometimes they apply very strong rules, they >> "yell" at the engine. But it sounds like you already have a good instinct, >> to "whisper" at Relevancy, at least at the start, and to think in terms of >> tie breakers. >> >> When you specify the fields that edismax is to search, you can give each of >> them a different weights. I think this will do most of what you want. >> >> Whether matches are combined via addition or multiplication can be >> controlled with different options in edismax, although sometimes you have to >> do a bit of reading and experimenting. >> >> Another trick that I sometimes use is to use copyField so that the same >> field is indexed several different ways. Then, the indexed field with an >> exact match is given a weight of 1.0, vs. a "fuzzy" match (for example with >> synonyms / thesaurus) is given only a weight of 0.5 or 0.3 >> >> -- >> Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com >> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 >> >> On Jun 10, 2014, at 12:26 AM, Derek Poh <d...@globalsources.com> wrote: >> >>> Hi Mark >>> >>> Appreciate you taking the time to reply and with references. >>> >>> Regarding 3. Configure and defined the relevance ranking and matching logic >>> of the return result. >>> >>> Can each search handler be configure to >>> - search on a few fields >>> - assign a numeric rank to each of the field, such that a match on a field >>> with the highest rank will rank the document higher in the return search >>> result. >>> - the ranking of each field will also act as tie-breaker. >>> Eg. >>> Category = 3 >>> SPPKeyWord= 2 >>> KeySpecification= 1 >>> >>> Document that has match on field Category will be ranked higher in the >>> result than document that has match on SPPKeyWord. >>> Document that has match only on field KeySpecification willrank the lowest >>> in the result. >>> >>> >>> On 6/10/2014 12:27 AM, Mark Bennett wrote: >>>> Hello Derek, >>>> >>>> See answers inline. >>>> >>>> -- >>>> Mark Bennett / LucidWorks: Search & Big Data / mark.benn...@lucidworks.com >>>> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 >>>> >>>> On Jun 9, 2014, at 12:00 AM, Derek Poh <d...@globalsources.com> wrote: >>>> >>>>> My company is actively looking at alternative search engine applications >>>>> to replace our current Endeca application. >>>>> >>>>> I have no experience and knowledge on Solr and Lucene. >>>>> Please bear with me, I would like to find out if the following features >>>>> are available on Solr. >>>>> >>>>> 1. Aggregate results (rollups). >>>>> Eg. Froma list of search result of products (each has field = supplier >>>>> id), can the results be aggregated by supplier id with the original >>>>> results ordering retain. >>>> Yes it can: >>>> http://wiki.apache.org/solr/FieldCollapsing >>>> >>>>> 2. Filter/Navigator, counts. >>>>> List out a field's possible values and their counts fromthe indexed data >>>>> and from the return results. >>>>> The field's values can be sorted by the values description or by the >>>>> values countsin the return results. >>>> Yes, Solr calls these "Facets" and offers several types: >>>> http://wiki.apache.org/solr/SimpleFacetParameters >>>> http://wiki.apache.org/solr/HierarchicalFaceting >>>> >>>>> Eg. Field "Business Type" belowwith it's possible values and the count >>>>> for each value(in bracket). Can the field be return in the result with >>>>> it's values sorted either by description or bycounts? >>>>> Business Type >>>>> Manufacturer (15269) >>>>> Exporter (12493) >>>>> Trading Company (5541) >>>>> Agent (1324) >>>>> Wholesaler (1202) >>>>> Importer (682) >>>>> Buying Office (394) >>>>> Distributor (278) >>>>> Other (157) >>>>> Retailer (116) >>>>> Consultant (54) >>>> Absolutely, and Solr is very fast and accurate. >>>> >>>>> 3. Configureand defined the relevance rankingand matching logic of the >>>>> return result. >>>> Yes, though not by that name. >>>> Step 1: >>>> Configure default edismax parameters in your solrconfig.xml >>>> >>>> Step 2: >>>> Create additional search handlers in solrconfig.xml, and each search >>>> handler can have its own edismax configuration. >>>> >>>> Normally the format of the search URL is: >>>> http://localhost:8983/solr/collection_name/select?q=text:budget >>>> >>>> You would replace the "select" with the name of the search handler that >>>> has the edismax config you want. >>>> >>>> With multiple search handlers, you'd use something like: >>>> >>>> http://localhost:8983/solr/collection_name/search_freshest?q=text:budget >>>> >>>> http://localhost:8983/solr/collection_name/search_most_popular?q=text:budget >>>> >>>>> 4. Defined and configure the thesaurus (1-wayor 2-way), stemming and stop >>>>> words. >>>> Yes, Solr is very good about this, you have both options. >>>> >>>> Also, Solr let's you choose: >>>> * Index time, or query time, or both >>>> * Use expansion or reduction >>>> >>>> You can even have more than one thesaurus file and have them each handled >>>> differently. >>>> >>>> For example: >>>> * Use an english_language thesaurus, which rarely changes, and expand that >>>> at index time >>>> * Use your company_synonyms, which may change frequently, and expand them >>>> at search time. >>>> >>>> I'll let you find these in the wiki, http://wiki.apache.org >>>> >>>>> 5. Multi-language supportfor Simplified Chinese and Spanish. >>>> Yes! >>>> >>>> And for simplified Chinese, please make sure to use the SmartCN analyzer, >>>> and not the simplistic "CJK"; SmartCN actually looks for Chinese language >>>> word breaks using statistical methods, and therefore should give better >>>> results. >>>> >>>>> 6. Scalability. >>>>> At present, we are indexing 4million recordsand the number is expected to >>>>> increase by more than 10 folds in the near future. >>>> 40 million documents can normally be handled on a single machine, assuming >>>> it has enough RAM and doesn't have a lot of other stuff running. >>>> You might want a second machine for failover. >>>> >>>> When people use multiple machines, then the way to do that is via >>>> SolrCloud. >>>> >>>>> 7. Search results debugging. Eg. why record was matchedor why record was >>>>> ranked as such. >>>> Yes. >>>> >>>> You typically add &debugQuery=true&debug.explain.structured=true to the >>>> URL. >>>> >>>> The output is a bit technical, it takes some practice to understand. >>>> >>>> There's also a graphical relevancy debugger with a free eval period: >>>> http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/ >>>> >>>>> Derek >>>>> >>>>> ---------------------- >>>>> CONFIDENTIALITY NOTICE >>>>> This e-mail (including any attachments) may contain confidential and/or >>>>> privileged information. If you are not the intended recipient or have >>>>> received this e-mail in error, please inform the sender immediately and >>>>> delete this e-mail (including any attachments) from your computer, and >>>>> you must not use, disclose to anyone else or copy this e-mail (including >>>>> any attachments), whether in whole or in part. >>>>> This e-mail and any reply to it may be monitored for security, legal, >>>>> regulatory compliance and/or other appropriate reasons. >>>> >>> >>> ---------------------- >>> CONFIDENTIALITY NOTICE >>> This e-mail (including any attachments) may contain confidential and/or >>> privileged information. If you are not the intended recipient or have >>> received this e-mail in error, please inform the sender immediately and >>> delete this e-mail (including any attachments) from your computer, and you >>> must not use, disclose to anyone else or copy this e-mail (including any >>> attachments), whether in whole or in part. >>> This e-mail and any reply to it may be monitored for security, legal, >>> regulatory compliance and/or other appropriate reasons. >> >> > > > ---------------------- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons.