Re: Why isn't the DateField implementation of ISO 8601 broader?
On 6 Oct 09, at 5:31 PM, Chris Hostetter wrote: ...your expectations may be different then everyone elses. by requiring that the dates be explicit there is no ambiguity, you are in control of the behavior. The power of some of the other formulas in ISO 8601 is that you don't introduce false levels of precision. The October 2009 issue of a magazine is precisely tagged as 200910 or 2009-10 . It doesn't have a day, hour or minute. Most books come with a copyright year: no month, no day ... In the library/book/periodical world these are a common set of expectations. Walter
Re: ExtractingRequestHandler unknown field 'stream_source_info'
On 1 Oct 09, at 12:46 PM, Tricia Williams wrote: STREAM_SOURCE_INFO https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 appears to be a constant from this page: http://lucene.apache.org/solr/api/constant-values.html This has it embedded as an arr in the results http://www.nabble.com/Solr-question-td25271706.html Whether any of these help or not ... Walter
Logging errors from multiple solr instances
I'm running solr 1.1 under Tomcat 5.5. On the development machine there are a modest number of instances of solr indexes (six). In the logs currently the only way to distinguish them is to compare the [EMAIL PROTECTED], where the someIdentifier changes each time Tomcat is restarted (depressingly frequently in my programming style). This value isn't written out for commits and queries, as well as a variety of other instances where distinguishing between the activities posted against the various indexes would be useful. Is this addressed in 1.2 or is running multiple instances of indexes such a Bad Idea that supporting this would be leading a fool further astray? Walter Lewis
host logging options (was Re: Schema validator/debugger)
Andrew Nagy wrote: Yonik Seeley wrote: I dropped your schema.xml directly into the Solr example (using Jetty), fired it up, and everything works fine!? Okay, I switched over to Jetty and now I get a different error: SEVERE: org.apache.solr.core.SolrException: undefined field text As someone who has used both Jetty and Tomcat in production (and has come to prefer Tomcat), what are my choices to get the undefined field xxx error in the catalina log files (or is it stashed somewhere I'm overlooking?) Walter Lewis
Re: Using solr on windows
Erik Hatcher wrote: Cygwin needs curl installed. It should be fairly easy to select that and have it installed. It's been a while since I've used cygwin, but I do recall a list of packages to install. I would just note that, while the examples as designed around Cygwin, it is by no means a dependency for running SOLR. Curl is available as a binary for Win32, in its latest release, and runs from an ordinary command window. The implication is that you can also build batch files that mimic the functionality of the .sh files in the examples. The following would need to be adjusted for the host name and port, and the name of the instance of solr, but they will run from the command prompt in the directory where curl is installed. curl http://localhost/solr/update --data-binary @test.xml curl http://localhost/solr/update --data-binary commit / curl http://localhost/solr/update --data-binary optimize / Tomcat or apache/tomcat on Windows 2003 is a perfectly adequate servlet container for solr. For that matter, SirsiDynix's Horizon runs Jetty over Windows 2003 in many of their production environments, it's just a bear to configure as a service. Walter Lewis
Re: Fuzzy searching, tildes and solr
Yonik Seeley wrote: +(+text:jame +text:sutherland) +searchSet:testSet +(+text:james~0.75 +text:sutherland~0.75) +searchSet:testSet I can tell from the first that this is a stemmed field... james is transformed to jame James being the plural of Jame according to the stemmer. I guess my mind hadn't run in that direction. :) I guess I wasn't expecting the fuzzy query logic to bypass the stemming. Would it be correct that if I were to add james to the protwords.txt file that this *specific* problem would go away? Obviously there are a significant quantity of proper names where this would have an impact, so a more generic solution is preferable. So, you could - index the field twice using copyField, and then do fuzzy queries on the non-stemmed version. [plus two other good suggestions] As I look at the field types in the example schema would you recommend something like text_lu without the EnglishPorterFilterFactory, or are there other issues I'm overlooking. Walter Lewis (aka Walt Lewi apparently)
Re: Fuzzy searching, tildes and solr
Yonik Seeley wrote: On 1/23/07, Walter Lewis [EMAIL PROTECTED] wrote: This is quite possibly a Lucene question rather than a solr one, so my apologies if you think its out of scope. Underlying the solr search, are some very useful Lucene constructs. One of the most powerful, imho, is the tilde number combination for a fuzzy search. In one of my data sets q=Sutherland returns 41 results q=Sutherland~0.75 returns 275 q=Sutherland~0.70 returns 484 etc. all of which fits a pattern Add a first name and q=(James Sutherland) returns 13 q=(James~0.75 Sutherland~0.75) returns 1 q=(James~0.70 Sutherland~0.70) returns 97 Qualify only one term and there is a consistent pattern. But routinely qualifying two terms yields a smaller number than a string match. Trying q=(James~0.75 AND Sutherland~0.75) returns the same record (the schema has default set to AND) Why would the ~0.75 *narrow* rather than broaden a search? Is there some pattern in the solr syntax I'm overlooking? That's a great question... that doesn't make sense. Could you post your debug-query output (add debugQuery=on)? My apologies for the delay and for the generally excessive top quoting here. I thought it might save a bit of time to keep the alternatives together. I should also note that I simplified the queries above. Each ran with a searchSet constraint, which was the same value. The normal queries also have a significant baggages of fields and facets, which are also consistent across the whole set of them. I ran the debug against the two following queries: q=(James Sutherland) returns 13 q=(James~0.75 Sutherland~0.75) returns 1 I have attached the debug fragments below. Walter lst name=debug str name=rawquerystring(james sutherland) searchSet:testSet/str str name=querystring(james sutherland) searchSet:testSet/str - str name=parsedquery +(+text:jame +text:sutherland) +searchSet:testSet /str - str name=parsedquery_toString +(+text:jame +text:sutherland) +searchSet:testSet /str - lst name=explain - str name=id=MHGL.502,internal_docid=80313 2.2928324 = (MATCH) sum of: 2.2204013 = (MATCH) sum of: 0.444597 = (MATCH) weight(text:jame in 80313), product of: 0.46986106 = queryWeight(text:jame), product of: 4.370453 = idf(docFreq=3085) 0.107508555 = queryNorm 0.94623077 = (MATCH) fieldWeight(text:jame in 80313), product of: 1.7320508 = tf(termFreq(text:jame)=3) 4.370453 = idf(docFreq=3085) 0.125 = fieldNorm(field=text, doc=80313) 1.7758043 = (MATCH) weight(text:sutherland in 80313), product of: 0.8738745 = queryWeight(text:sutherland), product of: 8.128418 = idf(docFreq=71) 0.107508555 = queryNorm 2.0321045 = (MATCH) fieldWeight(text:sutherland in 80313), product of: 2.0 = tf(termFreq(text:sutherland)=4) 8.128418 = idf(docFreq=71) 0.125 = fieldNorm(field=text, doc=80313) 0.072431125 = (MATCH) weight(searchSet:testSet in 80313), product of: 0.124795556 = queryWeight(searchSet:testSet), product of: 1.1607965 = idf(docFreq=76441) 0.107508555 = queryNorm 0.58039826 = (MATCH) fieldWeight(searchSet:testSet in 80313), product of: 1.0 = tf(termFreq(searchSet:testSet)=1) 1.1607965 = idf(docFreq=76441) 0.5 = fieldNorm(field=searchSet, doc=80313) /str - str name=id=MHGL.503,internal_docid=80314 2.1340907 = (MATCH) sum of: 2.0616596 = (MATCH) sum of: 0.43047923 = (MATCH) weight(text:jame in 80314), product of: 0.46986106 = queryWeight(text:jame), product of: 4.370453 = idf(docFreq=3085) 0.107508555 = queryNorm 0.91618407 = (MATCH) fieldWeight(text:jame in 80314), product of: 2.236068 = tf(termFreq(text:jame)=5) 4.370453 = idf(docFreq=3085) 0.09375 = fieldNorm(field=text, doc=80314) 1.6311804 = (MATCH) weight(text:sutherland in 80314), product of: 0.8738745 = queryWeight(text:sutherland), product of: 8.128418 = idf(docFreq=71) 0.107508555 = queryNorm 1.8666072 = (MATCH) fieldWeight(text:sutherland in 80314), product of: 2.4494898 = tf(termFreq(text:sutherland)=6) 8.128418 = idf(docFreq=71) 0.09375 = fieldNorm(field=text, doc=80314) 0.072431125 = (MATCH) weight(searchSet:testSet in 80314), product of: 0.124795556 = queryWeight(searchSet:testSet), product of: 1.1607965 = idf(docFreq=76441) 0.107508555 = queryNorm 0.58039826 = (MATCH) fieldWeight(searchSet:testSet in 80314), product of: 1.0 = tf(termFreq(searchSet:testSet)=1) 1.1607965 = idf(docFreq=76441) 0.5 = fieldNorm(field=searchSet, doc=80314) /str - str name=id=MHGL.501,internal_docid=80312 1.5031691 = (MATCH) sum of: 1.430738 = (MATCH) sum of: 0.32086027 = (MATCH) weight(text:jame in 80312), product of: 0.46986106 = queryWeight(text:jame), product of: 4.370453 = idf(docFreq=3085) 0.107508555 = queryNorm 0.68288326 = (MATCH) fieldWeight(text:jame
Fuzzy searching, tildes and solr
This is quite possibly a Lucene question rather than a solr one, so my apologies if you think its out of scope. Underlying the solr search, are some very useful Lucene constructs. One of the most powerful, imho, is the tilde number combination for a fuzzy search. In one of my data sets q=Sutherland returns 41 results q=Sutherland~0.75 returns 275 q=Sutherland~0.70 returns 484 etc. all of which fits a pattern Add a first name and q=(James Sutherland) returns 13 q=(James~0.75 Sutherland~0.75) returns 1 q=(James~0.70 Sutherland~0.70) returns 97 Qualify only one term and there is a consistent pattern. But routinely qualifying two terms yields a smaller number than a string match. Trying q=(James~0.75 AND Sutherland~0.75) returns the same record (the schema has default set to AND) Why would the ~0.75 *narrow* rather than broaden a search? Is there some pattern in the solr syntax I'm overlooking? Walter
Re: solr + cocoon problem
[EMAIL PROTECTED] wrote: Any ideas on how to implement a cocoon layer above solr? You're far from the only one approaching solr via cocoon ... :) The approach we took, passes the search parameters to a solrsearch stylesheet, the heart of which is a cinclude block that embeds the solr results. A further transformation prepares the results of the solr query for display. The latest rewrite is getting more complicated as we work in flowscript to manipulate the values more before presenting them to solr, but the heart of the solution is below. Walter From the sitemap.xmap = map:match pattern=results map:generate type=request map:parameter name=generate-attributes value=true/ /map:generate map:transform type=xslt src=style/solrsearch.xsl map:parameter name=use-request-parameters value=true/ /map:transform map:transform type=cinclude / map:transform type=xslt src=style/search_result.xsl / map:serialize type=html/ /map:match === From solrsearch.xsl [assuming parameters of q, start and rows] cinclude:includexml cinclude:srchttp://localhost:8080/solr/select?q=xsl:value-of select='$q' /amp;start=xsl:value-of select='$start' /amp;rows=xsl:value-of select='$rows' / /cinclude:src /cinclude:includexml
Re: MatchAllDocsQuery in solr?
Walter Underwood wrote: I was thinking something similar, maybe _solr:all. At Infoseek, we hardcoded url:http to match all docs. I suppose that different data would yield different responses but a space ( ) works on our data. the other Walter
Sorting facets
My apologies if this has been answered before but I couldn't see it in the FAQ, tutorial or wiki or the solr-user mail archives. The explanation for sorting the documents returned by a search is quite straight foward. I'm currently seeing the facets arriving in a more random order. Is the expectation that the code processing the result will apply its own sorts to the facet lists? or is there some other option I'm overlooking? Walter Lewis