Re: Why isn't the DateField implementation of ISO 8601 broader?

2009-10-06 Thread Walter Lewis

On 6 Oct 09, at 5:31 PM, Chris Hostetter wrote:

...your expectations may be different then everyone elses.  by  
requiring
that the dates be explicit there is no ambiguity, you are in control  
of

the behavior.


The power of some of the other formulas in ISO 8601 is that you don't  
introduce false levels of precision.  The October 2009 issue of a  
magazine is precisely tagged as 200910 or 2009-10 .  It doesn't  
have a day, hour or minute.  Most books come with a copyright year: no  
month, no day ...


In the library/book/periodical world these are a common set of  
expectations.


Walter







Re: ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Walter Lewis

On 1 Oct 09, at 12:46 PM, Tricia Williams wrote:


STREAM_SOURCE_INFO



https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

appears to be a constant from this page:
  http://lucene.apache.org/solr/api/constant-values.html

This has it embedded as an arr in the results
http://www.nabble.com/Solr-question-td25271706.html

Whether any of these help or not ...

Walter


Logging errors from multiple solr instances

2007-06-07 Thread Walter Lewis
I'm running solr 1.1 under Tomcat 5.5.  On the development machine there 
are a modest number of instances of solr indexes (six).


In the logs currently the only way to distinguish them is to compare the 
[EMAIL PROTECTED], where the someIdentifier changes each time 
Tomcat is restarted (depressingly frequently in my programming style).


This value isn't written out for commits and queries, as well as a 
variety of other instances where distinguishing between the activities 
posted against the various indexes would be useful.


Is this addressed in 1.2 or is running multiple instances of indexes 
such a Bad Idea that supporting this would be leading a fool further astray?


Walter Lewis


host logging options (was Re: Schema validator/debugger)

2007-06-07 Thread Walter Lewis

Andrew Nagy wrote:

Yonik Seeley wrote:

I dropped your schema.xml directly into the Solr example (using
Jetty), fired it up, and everything works fine!?

Okay, I switched over to Jetty and now I get a different error:
SEVERE: org.apache.solr.core.SolrException: undefined field text 
As someone who has used both Jetty and Tomcat in production (and has 
come to prefer Tomcat), what are my choices to get the undefined field 
xxx error in the catalina log files (or is it stashed somewhere I'm 
overlooking?)


Walter Lewis


Re: Using solr on windows

2007-02-14 Thread Walter Lewis

Erik Hatcher wrote:
Cygwin needs curl installed.  It should be fairly easy to select that 
and have it installed.  It's been a while since I've used cygwin, but 
I do recall a list of packages to install. 
I would just note that, while the examples as designed around Cygwin, it 
is by no means a dependency for running SOLR.


Curl is available as a binary for Win32, in its latest release, and runs 
from an ordinary command window. The implication is that you can also 
build batch files that mimic the functionality of the .sh files in the 
examples.


The following would need to be adjusted for the host name and port, and 
the name of the instance of solr, but they will run from the command 
prompt in the directory where curl is installed.


curl http://localhost/solr/update --data-binary @test.xml
curl http://localhost/solr/update --data-binary commit /
curl http://localhost/solr/update --data-binary optimize /

Tomcat or apache/tomcat on Windows 2003 is a perfectly adequate servlet 
container for solr.  For that matter, SirsiDynix's Horizon runs Jetty 
over Windows 2003 in many of their production environments, it's just a 
bear to configure as a service. 


Walter Lewis


Re: Fuzzy searching, tildes and solr

2007-01-26 Thread Walter Lewis

Yonik Seeley wrote:

+(+text:jame +text:sutherland) +searchSet:testSet

+(+text:james~0.75 +text:sutherland~0.75) +searchSet:testSet


I can tell from the first that this is a stemmed field... james is
transformed to jame
James being the plural of Jame according to the stemmer.  I guess my 
mind hadn't run in that direction. :)


I guess I wasn't expecting the fuzzy query logic to bypass the 
stemming.  Would it be correct that if I were to add james to the 
protwords.txt file that this *specific* problem would go away? Obviously 
there are a significant quantity of proper names where this would have 
an impact, so a more generic solution is preferable.



So, you could
- index the field twice using copyField, and then do fuzzy queries on
the non-stemmed version. [plus two other good suggestions]
As I look at the field types in the example schema would you recommend 
something like text_lu without the EnglishPorterFilterFactory, or are 
there other issues I'm overlooking.


Walter Lewis
(aka Walt Lewi apparently)


Re: Fuzzy searching, tildes and solr

2007-01-25 Thread Walter Lewis

Yonik Seeley wrote:

On 1/23/07, Walter Lewis [EMAIL PROTECTED] wrote:

This is quite possibly a Lucene question rather than a solr one, so my
apologies if you think its out of scope.

Underlying the solr search, are some very useful Lucene constructs.

One of the most powerful, imho, is the tilde number combination for a
fuzzy search.

In one of my data sets
q=Sutherland returns 41 results
q=Sutherland~0.75 returns 275
q=Sutherland~0.70 returns 484
etc. all of which fits a pattern Add a first name and
   q=(James Sutherland) returns 13
   q=(James~0.75 Sutherland~0.75) returns 1
q=(James~0.70 Sutherland~0.70) returns 97
Qualify only one term and there is a consistent pattern.  But routinely
qualifying two terms yields a smaller number than a string match.
Trying
   q=(James~0.75 AND Sutherland~0.75) returns the same record (the
schema has default set to AND)

Why would the ~0.75 *narrow* rather than broaden a search? Is there some
pattern in the solr syntax I'm overlooking?


That's a great question... that doesn't make sense.
Could you post your debug-query output (add debugQuery=on)?
My apologies for the delay and for the generally excessive top quoting 
here.  I thought it might save a bit of time to keep the alternatives 
together.  I should also note that I simplified the queries  above.  
Each ran with a searchSet constraint, which was the same value.  The 
normal queries also have a significant baggages of fields and facets, 
which are also consistent across the whole set of them.


I ran the debug against the two following queries:

  q=(James Sutherland) returns 13
  q=(James~0.75 Sutherland~0.75) returns 1

I have attached the debug fragments below.

Walter



lst name=debug
str name=rawquerystring(james sutherland) searchSet:testSet/str
str name=querystring(james sutherland) searchSet:testSet/str
-
   str name=parsedquery
+(+text:jame +text:sutherland) +searchSet:testSet
/str
-
   str name=parsedquery_toString
+(+text:jame +text:sutherland) +searchSet:testSet
/str
-
   lst name=explain
-
   str name=id=MHGL.502,internal_docid=80313

2.2928324 = (MATCH) sum of:
 2.2204013 = (MATCH) sum of:
   0.444597 = (MATCH) weight(text:jame in 80313), product of:
 0.46986106 = queryWeight(text:jame), product of:
   4.370453 = idf(docFreq=3085)
   0.107508555 = queryNorm
 0.94623077 = (MATCH) fieldWeight(text:jame in 80313), product of:
   1.7320508 = tf(termFreq(text:jame)=3)
   4.370453 = idf(docFreq=3085)
   0.125 = fieldNorm(field=text, doc=80313)
   1.7758043 = (MATCH) weight(text:sutherland in 80313), product of:
 0.8738745 = queryWeight(text:sutherland), product of:
   8.128418 = idf(docFreq=71)
   0.107508555 = queryNorm
 2.0321045 = (MATCH) fieldWeight(text:sutherland in 80313), product of:
   2.0 = tf(termFreq(text:sutherland)=4)
   8.128418 = idf(docFreq=71)
   0.125 = fieldNorm(field=text, doc=80313)
 0.072431125 = (MATCH) weight(searchSet:testSet in 80313), product of:
   0.124795556 = queryWeight(searchSet:testSet), product of:
 1.1607965 = idf(docFreq=76441)
 0.107508555 = queryNorm
   0.58039826 = (MATCH) fieldWeight(searchSet:testSet in 80313), 
product of:

 1.0 = tf(termFreq(searchSet:testSet)=1)
 1.1607965 = idf(docFreq=76441)
 0.5 = fieldNorm(field=searchSet, doc=80313)
/str
-
   str name=id=MHGL.503,internal_docid=80314

2.1340907 = (MATCH) sum of:
 2.0616596 = (MATCH) sum of:
   0.43047923 = (MATCH) weight(text:jame in 80314), product of:
 0.46986106 = queryWeight(text:jame), product of:
   4.370453 = idf(docFreq=3085)
   0.107508555 = queryNorm
 0.91618407 = (MATCH) fieldWeight(text:jame in 80314), product of:
   2.236068 = tf(termFreq(text:jame)=5)
   4.370453 = idf(docFreq=3085)
   0.09375 = fieldNorm(field=text, doc=80314)
   1.6311804 = (MATCH) weight(text:sutherland in 80314), product of:
 0.8738745 = queryWeight(text:sutherland), product of:
   8.128418 = idf(docFreq=71)
   0.107508555 = queryNorm
 1.8666072 = (MATCH) fieldWeight(text:sutherland in 80314), product of:
   2.4494898 = tf(termFreq(text:sutherland)=6)
   8.128418 = idf(docFreq=71)
   0.09375 = fieldNorm(field=text, doc=80314)
 0.072431125 = (MATCH) weight(searchSet:testSet in 80314), product of:
   0.124795556 = queryWeight(searchSet:testSet), product of:
 1.1607965 = idf(docFreq=76441)
 0.107508555 = queryNorm
   0.58039826 = (MATCH) fieldWeight(searchSet:testSet in 80314), 
product of:

 1.0 = tf(termFreq(searchSet:testSet)=1)
 1.1607965 = idf(docFreq=76441)
 0.5 = fieldNorm(field=searchSet, doc=80314)
/str
-
   str name=id=MHGL.501,internal_docid=80312

1.5031691 = (MATCH) sum of:
 1.430738 = (MATCH) sum of:
   0.32086027 = (MATCH) weight(text:jame in 80312), product of:
 0.46986106 = queryWeight(text:jame), product of:
   4.370453 = idf(docFreq=3085)
   0.107508555 = queryNorm
 0.68288326 = (MATCH) fieldWeight(text:jame

Fuzzy searching, tildes and solr

2007-01-23 Thread Walter Lewis
This is quite possibly a Lucene question rather than a solr one, so my 
apologies if you think its out of scope.


Underlying the solr search, are some very useful Lucene constructs.

One of the most powerful, imho, is the tilde number combination for a 
fuzzy search.


In one of my data sets
   q=Sutherland returns 41 results
   q=Sutherland~0.75 returns 275
   q=Sutherland~0.70 returns 484
etc. all of which fits a pattern Add a first name and
  q=(James Sutherland) returns 13
  q=(James~0.75 Sutherland~0.75) returns 1
   q=(James~0.70 Sutherland~0.70) returns 97
Qualify only one term and there is a consistent pattern.  But routinely 
qualifying two terms yields a smaller number than a string match.

Trying
  q=(James~0.75 AND Sutherland~0.75) returns the same record (the 
schema has default set to AND)


Why would the ~0.75 *narrow* rather than broaden a search? Is there some 
pattern in the solr syntax I'm overlooking?


Walter



  
  


Re: solr + cocoon problem

2007-01-16 Thread Walter Lewis

[EMAIL PROTECTED] wrote:

Any ideas on how to implement a cocoon layer above solr?

You're far from the only one approaching solr via cocoon ... :)

The approach we took, passes the search parameters to a solrsearch 
stylesheet, the heart of which is a cinclude block that embeds the 
solr results.  A further transformation prepares the results of the solr 
query for display.


The latest rewrite is getting more complicated as we work in flowscript 
to manipulate the values more before presenting them to solr, but the 
heart of the solution is below.


Walter


 From the sitemap.xmap =
   map:match pattern=results
   map:generate type=request
   map:parameter name=generate-attributes value=true/
   /map:generate
   map:transform type=xslt src=style/solrsearch.xsl
   map:parameter name=use-request-parameters value=true/
   /map:transform
   map:transform type=cinclude /
   map:transform type=xslt src=style/search_result.xsl /
   map:serialize type=html/
   /map:match

=== From solrsearch.xsl 
[assuming parameters of q, start and rows]

   cinclude:includexml
   
cinclude:srchttp://localhost:8080/solr/select?q=xsl:value-of 
select='$q' /amp;start=xsl:value-of select='$start' 
/amp;rows=xsl:value-of select='$rows' /

   /cinclude:src
   /cinclude:includexml




Re: MatchAllDocsQuery in solr?

2006-11-21 Thread Walter Lewis

Walter Underwood wrote:

I was thinking something similar, maybe _solr:all. At Infoseek, we
hardcoded url:http to match all docs.
I suppose that different data would yield different responses but a 
space ( ) works on our data.


the other Walter


Sorting facets

2006-11-04 Thread Walter Lewis
My apologies if this has been answered before but I couldn't see it in 
the FAQ, tutorial or wiki or the solr-user mail archives.


The explanation for sorting the documents returned by a search is quite 
straight foward. I'm currently seeing the facets arriving in a more 
random order.


Is the expectation that the code processing the result will apply its 
own sorts to the facet lists?  or is there some other option I'm 
overlooking?


Walter Lewis