On Mon, 14 Jul 2008 23:25:25 +
sundar shankar [EMAIL PROTECTED] wrote:
Thanks for your patient response. I dont wanna know the classes changed,
but I wanna get a hand on the wiki page for the same. I tried to search for
these classes in the solr wiki. I was getting a page does not
On Thu, 10 Jul 2008 17:55:55 -0600
Galen Pahlke [EMAIL PROTECTED] wrote:
Could this perhaps be because a date field has so many possible unique
values? I don't know how to find out exactly, but I'd guess there are
at least a few million unique dates in the index. Would increasing the
On Fri, 11 Jul 2008 15:22:35 +
sundar shankar [EMAIL PROTECTED] wrote:
I recently was looking to find details of 1.3 specific analysers and filters
in the solr wiki and was unable to do so. Could anyone please point me to a
place where I can find some documentation of the same.
Any
On Wed, 9 Jul 2008 08:48:35 +0530
Shalin Shekhar Mangar [EMAIL PROTECTED] wrote:
Yes, SOLR-350 added that capability. Look at
http://wiki.apache.org/solr/MultiCore for details.
ahh loving SOLR more every day :P
thx
_
{Beto|Norberto|Numard} Meijome
I used to hate
On Wed, 9 Jul 2008 19:51:45 +0530
Noble Paul _ __ [EMAIL PROTECTED] wrote:
You can put it into a 'string' field directly
if we refer to the default string field , you won't be able to search for the
contents of the XML (unless you search for the whole
On Thu, 10 Jul 2008 09:36:01 +0530
Noble Paul _ __ [EMAIL PROTECTED] wrote:
2. We're assuming we'll have thousands of users with independent data; any
good way to partition multiple indexes with solr? With Lucene we could
just save those in independent
On Tue, 8 Jul 2008 10:20:15 -0300
Hugo Barauna [EMAIL PROTECTED] wrote:
Hi,
I already haved aked this, but I didn't get any good answer, so I will try
again. I need to pre-process a stored field before it is saved. Just like a
field that is gonna be indexed. I would be good to apply an
On Tue, 8 Jul 2008 21:10:51 +0530
Shalin Shekhar Mangar [EMAIL PROTECTED] wrote:
Also note that you'll need to specify spellcheck.build=true only on the
first request when it will build the spell check index. The subsequent
requests need not have spellcheck.build=true.
as a matter of fact,
On Tue, 8 Jul 2008 12:05:45 -0400
Willie Wong [EMAIL PROTECTED] wrote:
I think the snapshooter will work fine for creating the indexes and then I
can use the multicore capabilities to make them available to users one
final question though, after snapshot has been created is there a way
On Fri, 4 Jul 2008 10:39:28 -0300
Alexander Ramos Jardim [EMAIL PROTECTED] wrote:
3. Did you mean feature
3.1. Does Solr implements that?
http://wiki.apache.org/solr/SpellCheckComponent
_
{Beto|Norberto|Numard} Meijome
And that's one reason we like to believe in
On Tue, 01 Jul 2008 17:04:07 +0530
Jacob Singh [EMAIL PROTECTED] wrote:
a).
Add jetty to a group called jetty
Somehow get jetty6 to use that group
Create another user (solr) and add it to the group jetty
Let it run the snapshooter
This seems the best option.
B
_
hi there,
when defining a field type, i understand the meaning of 'analyzer type=index'
, or type=query. What does it mean when the type is missing? does it apply at
both index and query ?
This can be found in the example's schema.xml :
!--
Setup simple analysis for spell checking
On Mon, 30 Jun 2008 05:52:33 -0400
Erik Hatcher [EMAIL PROTECTED] wrote:
Yes, that's exactly what it means.
Erik
great, thanks for the clarification.
B
_
{Beto|Norberto|Numard} Meijome
A dream you dream together is reality.
John Lennon
I speak for myself,
On Sun, 29 Jun 2008 19:40:44 -0300
Hugo Barauna [EMAIL PROTECTED] wrote:
I am having problems with a stored field. The problem is that field is not
being stored as I need it to be. It has a tokenizer
class=solr.HTMLStripWhitespaceTokenizerFactory, but when it is stored,
that tokenizer is not
Hi there,
Short and sweet :
Is SCRH intended to honour qt= ?
longer...
I'm testing the newest SCRH ( SOLR-572), using last night's nightly build.
I have defined a 'dismax' request handler which searches across a number of
fields. When I use the SCRH in a query, and I pass the qt=dismax
Hi there,
I am using the an almost default, config of spellcheck component ( details @
very end of email). I have the 3 spellcheckers defined, 'default',
'jarowinkler' and 'file'.
I tried adding spellcheck.name=jarowinklerspellcheck.build=true , and with
spellcheck.reload=true as well ,
On Fri, 27 Jun 2008 01:44:38 +1000
Norberto Meijome [EMAIL PROTECTED] wrote:
I am using the an almost default, config of spellcheck component ( details @
very end of email). I have the 3 spellcheckers defined, 'default',
'jarowinkler' and 'file'.
I tried adding spellcheck.name
On Thu, 26 Jun 2008 16:25:46 -0500 (CDT)
Geoffrey Young [EMAIL PROTECTED] wrote:
it seems like it ought to work as a component of your dismax handler. this
works for me:
[]
ah i see now. cool. too bad about the crash.
I don't know what the policy is for opening bugs in JIRA...should
On Thu, 26 Jun 2008 16:25:46 -0500 (CDT)
Geoffrey Young [EMAIL PROTECTED] wrote:
well *almost* - it works most excellently with q=$term but when I add
spellchecker.q=$term things implode:
HTTP Status 500 - null java.lang.NullPointerException at
On Wed, 25 Jun 2008 08:37:35 +0200
Brian Carmalt [EMAIL PROTECTED] wrote:
There is a plugin for jetty: http://webtide.com/eclipse. Insert this as
and update site and let eclipse install the plugin for you You can then
start the jetty server from eclipse and debug it.
Thanks Brian, good
On Tue, 24 Jun 2008 19:17:58 -0700
Ryan McKinley [EMAIL PROTECTED] wrote:
also, check the LukeRequestHandler
if there is a document you think *should* match, you can see what
tokens it has actually indexed...
hi Ryan,
I can't see the tokens generated using LukeRequestHandler.
I can get
Hi,
where can I find these sources? I have the binary jars included with the
nightly builds,but I'd like to look @ the code of some of the objects.
In particular,
http://svn.apache.org/viewvc/lucene/java/
doesnt have any reference to 2.4, and
On Wed, 25 Jun 2008 20:22:06 -0400
Grant Ingersoll [EMAIL PROTECTED] wrote:
Note, also, that the Manifest file in the JAR has information about
the exact SVN revision so that you can check it out from there.
On Jun 25, 2008, at 12:37 PM, Yonik Seeley wrote:
trunk is the latest
On Wed, 25 Jun 2008 15:37:09 -0300
Jonathan Ariel [EMAIL PROTECTED] wrote:
I've been trying to use the NGramTokenizer and I ran into a problem.
It seems like solr is trying to match documents with all the tokens that the
analyzer returns from the query term. So if I index a document with a
On Thu, 26 Jun 2008 10:44:32 +1000
Norberto Meijome [EMAIL PROTECTED] wrote:
On Wed, 25 Jun 2008 15:37:09 -0300
Jonathan Ariel [EMAIL PROTECTED] wrote:
I've been trying to use the NGramTokenizer and I ran into a problem.
It seems like solr is trying to match documents with all the tokens
On Thu, 26 Jun 2008 01:15:34 -0300
Jonathan Ariel [EMAIL PROTECTED] wrote:
Ok. Played a bit more with that.
So I had a difference between my unit test and solr. In solr I'm actually
using a solr.RemoveDuplicatesTokenFilterFactory when querying. Tried to add
that to the test, and it fails.
So
hi all,
( I'm using 1.3 nightly build from 15th June 08.)
Is there some documentation about how analysers + tokenizers are applied in
fields ? In particular, my question :
- If I define 2 tokenizers in a fieldtype, only the first one is applied, the
other is ignored. Is that because the 2nd
On Tue, 24 Jun 2008 00:14:57 -0700
Ryan McKinley [EMAIL PROTECTED] wrote:
best docs are here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
yes, I've been reading that already , thanks :)
- If I define 2 tokenizers in a fieldtype, only the first one is
applied, the
On Tue, 24 Jun 2008 16:04:24 +0100
Dave Searle [EMAIL PROTECTED] wrote:
At the moment I have an index of forum messages (each message being a
separate doc). Results are displayed on a per message basis, however, I would
like to group the results via their thread. Apart from using a facet on
On Tue, 24 Jun 2008 16:34:44 +0100
Dave Searle [EMAIL PROTECTED] wrote:
I am currently storing the thread id within the message index, however,
although this would allow me to sort, it doesn't help with the grouping of
threads based on relevancy. See the idea is to index message data in the
hi,
I'm trying to understand why a search on a field tokenized with the nGram
tokenizer, with minGramSize=n and maxGramSize=m doesn't find any matches for
queries of length (in characters) of n+1..m (n works fine).
analysis.jsp shows that it SHOULD match, but /select doesn't bring anything
back.
On Tue, 24 Jun 2008 19:17:58 -0700
Ryan McKinley [EMAIL PROTECTED] wrote:
also, check the LukeRequestHandler
if there is a document you think *should* match, you can see what
tokens it has actually indexed...
right, I will look into that a bit more.
I am actually using the lukeall.jar
hi there,
my use case : I want to be able to match documents when only a partial word is
provided. ie, searching for 'roc' or 'ock' should match documents containing
'rock'.
As I understand, the way to solve this problem is to use the nGram tokenizer @
index time and the nGram analyser @
On Mon, 23 Jun 2008 16:23:55 +1000
Norberto Meijome [EMAIL PROTECTED] wrote:
hi there,
my use case : I want to be able to match documents when only a partial word
is provided. ie, searching for 'roc' or 'ock' should match documents
containing 'rock'.
As I understand, the way to solve
On Mon, 23 Jun 2008 14:23:14 -0700
Jon Drukman [EMAIL PROTECTED] wrote:
ok well let's say that i can live without john/jon in the short term.
what i really need today is a case insensitive wildcard search with
literal matching (no fancy stemming. bobby is bobby, not bobbi.)
what are my
On Mon, 23 Jun 2008 05:33:49 -0700 (PDT)
Otis Gospodnetic [EMAIL PROTECTED] wrote:
Hi,
When you add debugQuery=true to the request, what does your query look like
after parsing?
BTW, I've tested same data + similar config using EdgeNGramTokenizer and this
works properly - I can
Hi all,
I'm curious , what is the cost (memory / processing time @ load? performance
hit ? ) of having several unused fieldTypes defined in schema.xml ?
cheers,
B
_
{Beto|Norberto|Numard} Meijome
Egotism is the anesthetic that dulls the pain of stupidity.
Frank Leahy
On Mon, 16 Jun 2008 14:22:12 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:
There are two levels of dynamic field support.
Specific dynamic fields can be queried with dismax, but you can't
wildcard the qf or other field parameters.
Thanks Yonik. ok, that matches what I've seen - if i know the
Hi everyone,
I just wanted to confirm that dynamic fields cannot be used with dismax
By this I mean that the following :
schema.xml
[...]
dynamicField name=dyn_1_* type=text indexed=true
stored=true required=false /
[..]
solrconfig.xml
[..]
requestHandler name=dismax1
On Sun, 15 Jun 2008 14:38:15 +0200
Roberto Nieto [EMAIL PROTECTED] wrote:
Hi Otis,
Thanks a lot for your interest.
The main thing i cant understand very well is that if I have 8 maquines that
will be searchers, for example, why they will have a higher cost of hw if I
have one big index.
On Thu, 15 May 2008 12:54:25 -0700 (PDT)
Otis Gospodnetic [EMAIL PROTECTED] wrote:
5) Hardware recommendations are hard to do. While people may make
suggestions, the only way to know how *your* hardware works with *your* data
and *your* shards and *your* type of queries is by benchmarking.
On Thu, 15 May 2008 09:23:03 -0700
William Pierce [EMAIL PROTECTED] wrote:
[...]
Our app in brief: We get merchant sku files (in either xml/csv) which we
process and index and make available to our site visitors to search. Our
current plan calls for us to support approx 10,000 merchants
On Mon, 12 May 2008 16:16:28 +0530
Sachit P. Menon [EMAIL PROTECTED] wrote:
My project requires having the same content (mostly) in multiple languages.
hi Sachit,
please search the archives of the list. this topic seems to come up twice a
week or thereabouts :)
You are of course encouraged
On Wed, 7 May 2008 11:26:50 -0400 (EDT)
Phillip Rhodes [EMAIL PROTECTED] wrote:
I currently have a java-based application that stores all objects on the file
system (text, blobs) and uses lucene to search the objects. If I can store
these objects in solr, I would greatly increase the
On Thu, 8 May 2008 09:24:45 -0400 (EDT)
Phillip Rhodes [EMAIL PROTECTED] wrote:
B,
My thoughts are coming from experience while writing and using stitches.
Stitches is a java-based project that allows local and remote java clients
(using hessian for java, xfire for dotnet) to search,
On Tue, 29 Apr 2008 10:10:09 +0200
Nico Heid [EMAIL PROTECTED] wrote:
So now the Question:
Is there a way to split a too big index into smaller ones? Do I have to
create more instances at the beginning, so that I will not run out of power
and space? (which will ad quite a bit of redundance of
On Mon, 7 Apr 2008 16:37:48 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:
On Mon, Apr 7, 2008 at 4:30 PM, Mike Klaas [EMAIL PROTECTED] wrote:
'top', 'vmstat' tell exactly what's going on in terms of io and cpu on
unix. Perhaps someone has gotten these to work under windows with cygwin.
On Thu, 3 Apr 2008 18:14:56 -0300
Jonathan Ariel [EMAIL PROTECTED] wrote:
I'm experiencing a really poor performance when using date ranges in solr
query. Is it a know issue? is there any special consideration when using
date ranges? It seems weird because I always thought date dates are
On Wed, 2 Apr 2008 15:31:43 -0500
[EMAIL PROTECTED] wrote:
This is very general requirement and I am sure somebody might have thought
about the solution.
Hi Sunil,
- please don't hijack the thread :)
- why don't you use the right tool for the problem? from what you said, a RDBMS
sounds like
On Mon, 24 Mar 2008 22:58:18 -0700 (PDT)
Vinci [EMAIL PROTECTED] wrote:
*Hadoop is more focusing on the disturbuted crawler as far I know...
Hadoop is distributed processing based on the MapReduce algorithm/approach.
Nutch is a lucene related project that uses Hadoop for the crawler and
On Thu, 20 Mar 2008 09:07:08 -0700 (PDT)
Raghav Kapoor [EMAIL PROTECTED] wrote:
[...]
Any particular reason why need the server in this
situation? pretty much
everything you are doing can be done locally.
Except, probably, cross linking
between client's documents. I have no idea in
On Wed, 19 Mar 2008 21:22:42 -0700 (PDT)
Raghav Kapoor [EMAIL PROTECTED] wrote:
I am new to Solr and I am facing a question if solr can be helpful in a
project that I'm working on.
welcome :)
The project is a client/server app that requires a client app to index the
documents and send the
On Wed, 19 Mar 2008 17:04:34 -0700 (PDT)
swarag [EMAIL PROTECTED] wrote:
In Lucene there is a Ram Based Index
org.apache.lucene.store.RAMDirectory.
Is there a way to setup my index in solr to use a RAMDirectory?
create a mountpoint on a ramdrive (tmpfs in linux, i think), and put your index
On Fri, 7 Mar 2008 17:59:48 -0800 (PST)
Chris Hostetter [EMAIL PROTECTED] wrote:
I believe Norberto ment he was handling it in his update client code --
before sending the docs to Solr.
Indeed, this what we do. We have a process that parses certain files, generates
documents following the
On Thu, 6 Mar 2008 11:33:38 -0500
Jon Baer [EMAIL PROTECTED] wrote:
Im interested to know if composite keys are now possible or if there
is anything to copyField I can use to get composite keys working for
my doc ids?
FWIW, we just do this @ doc generation time - grab several fields,
On Fri, 29 Feb 2008 13:02:21 -0500
Yonik Seeley [EMAIL PROTECTED] wrote:
On Fri, Feb 29, 2008 at 12:45 AM, Walter Underwood
[EMAIL PROTECTED] wrote:
Good point. My numbers are from a full rebuild. Let's collect maximum
times, to keep it simple. --wunder
You may see more variation than
On Fri, 15 Feb 2008 11:09:45 +0100
Maximilian Hütter [EMAIL PROTECTED] wrote:
Hi,
is there a way to transform a Solr update response with a XSLT-Stylesheet?
It looks like the XSLTResponseWriter is only used for searches.
Best regards,
Max
Hi Maximilian,
yes, it is definitely
On Wed, 16 Jan 2008 16:54:56 +0100
Philippe Guillard [EMAIL PROTECTED] wrote:
Hi here,
It seems that Lucene accepts any kind of XML document but Solr accepts only
flat name/value pairs inside a document to be indexed.
You'll find below what I'd like to do, Thanks for help of any kind !
On Wed, 2 Jan 2008 16:25:58 +0530
Laxmilal Menaria [EMAIL PROTECTED] wrote:
I have tried Solr using jetty, its run on command prompt, but now I want to
comfigure solr on tomcat-6, so nay one know how to configure it as windows
service using tomcat.
Any particular reason you don't use Jetty as
On Wed, 12 Dec 2007 20:04:00 -0500
Norskog, Lance [EMAIL PROTECTED] wrote:
... SOLR-303 (Distributed Search over HTTP)...
Woo-hoo!
hear hear!!!
_
{Beto|Norberto|Numard} Meijome
Your reasoning is excellent -- it's only your basic assumptions that are wrong.
I speak
On Tue, 27 Nov 2007 18:18:16 +0100
Siegfried Goeschl [EMAIL PROTECTED] wrote:
Hi folks,
working on a closed source project for an IP concerned company is not
always fun ... we combined SOLR with JAMon
(http://jamonapi.sourceforge.net/) to keep an eye of the query times and
this might be
On Tue, 27 Nov 2007 18:12:13 -0500
Brian Whitman [EMAIL PROTECTED] wrote:
On Nov 27, 2007, at 6:08 PM, bbrown wrote:
I couldn't tell if this was asked before. But I want to perform a
nutch crawl
without any solr plugin which will simply write to some index
directory. And
then
On Fri, 23 Nov 2007 21:37:14 -0800 (PST)
Chris Hostetter [EMAIL PROTECTED] wrote:
2) the issue with replication/distribution and windows isn't rsync (that
is available as part of cygwin) the issue relates to the fact that even
though windows has hardlinks, you can't move a hard link to a
On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt [EMAIL PROTECTED] wrote:
After a lot of googling, I came across Heritrix, which seems to be the
most robust well supported open source crawler out there. Heritrix
has an integration with Nutch (NutchWax), but not with Solr. I'm
On Thu, 22 Nov 2007 19:10:46 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:
The answer to that question, Norberto, would depend on versions.
Otis, would that relate to what underlying version of Lucene is being used in
either Solr Nutch?
_
On Tue, 20 Nov 2007 17:39:58 -0500
Jae Joo [EMAIL PROTECTED] wrote:
Hi,
Can anyone help me how to facet and/or search for associated fields? -
http://wiki.apache.org/solr/SimpleFacetParameters
_
{Beto|Norberto|Numard} Meijome
Fear not the path of truth for the lack
On Mon, 19 Nov 2007 16:53:17 +1100
climbingrose [EMAIL PROTECTED] wrote:
The easiest solution I know is:
deletequeryid:1 OR id:2 OR .../query/delete
If you know that all of these ids can be found by issuing a query, you
can do delete by query:
deletequeryYOUR_DELETE_QUERY_HERE/query/delete
On Fri, 9 Nov 2007 09:03:01 -0300
Isart Montane [EMAIL PROTECTED] wrote:
I've read there's a kernel limitation for a 32 bits architecture of 2Gb
per process, and i just wanna know if anybody knows an alternative to
get a new 64bits server.
You don't say what CPU you have. But the 32 bit limit
On Fri, 9 Nov 2007 10:30:16 -0300
Isart Montane [EMAIL PROTECTED] wrote:
I've got a dual Xeon. Here you are my cpuinfo. I've read the limit on
a 2.6linux kernel is 4GB on user space and 4GB for kernel... that's
why I asked
if there's any way to reach 4GB per process.
ok - i'm obviously too
On Fri, 9 Nov 2007 11:58:53 -0300
Isart Montane [EMAIL PROTECTED] wrote:
More info.
The kernel is compiled with HIGHMEM64 and PAE
Sorry, i havent dealt with linux kernel options for years.
PAE will give you 36 bits of address. but if the kernel is still limiting the
user space to 2 GB /
On Wed, 7 Nov 2007 20:18:25 -0800 (PST)
David Neubert [EMAIL PROTECTED] wrote:
I am sure this is 101 question, but I am bit confused about indexing xml data
using SOLR.
I have rich xml content (books) that need to searched at granular levels
(specifically paragraph and sentence levels
On Tue, 9 Oct 2007 10:12:51 -0400
David Whalen [EMAIL PROTECTED] wrote:
So, how would you build it if you could? Here are the specs:
a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have
On Tue, 02 Oct 2007 15:26:33 -0700
Walter Underwood [EMAIL PROTECTED] wrote:
Here at Netflix, we switched over our site search to Solr two weeks ago.
We've seen zero problems with the server. We average 1.2 million
queries/day on a 250K item index. We're running four Solr servers
with simple
On Thu, 20 Sep 2007 09:58:17 +0200
David Welton [EMAIL PROTECTED] wrote:
That seems to be how Sphinx works:
http://www.sphinxsearch.com/doc.html#distributed
Of course, the details of this are far over my head for either system,
so I don't really know if that's a sensible way of doing
On Thu, 20 Sep 2007 09:53:46 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote:
Maybe I got this wrong...but isn't this what mapreduce is meant to deal
with?
Not really... you could force a *lot* of different problems into
map-reduce
On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley [EMAIL PROTECTED] wrote:
Stu is referring to Federated Search - where each index has some of the
data and results are combined before they are returned. This is not yet
supported out of the box
Maybe this is related. How does this compare to
On Wed, 19 Sep 2007 10:29:54 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:
Maybe this is related. How does this compare to the map-reduce
functionality in Nutch/Hadoop ?
map-reduce is more for batch jobs. Nutch only uses map-reduce for
parallel indexing, not searching.
I see... so in
On Thu, 20 Sep 2007 09:37:51 +0800
Jarvis [EMAIL PROTECTED] wrote:
If we use the RPC call in nutch .
Hi,
I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this
league to be suggesting architecture stuff :) but i imagine there's nothing
wrong with using what they've built
On Thu, 20 Sep 2007 10:02:08 +0800
Jarvis [EMAIL PROTECTED] wrote:
You can see the code in org.apache.nutch.searcher.NutchBean class . :)
thx for the pointer.
_
{Beto|Norberto|Numard} Meijome
In order to avoid being called a flirt, she always yielded easily.
Charles,
On Thu, 20 Sep 2007 10:21:39 +0800
Jarvis [EMAIL PROTECTED] wrote:
What you say is done by hadoop that support Hardware Failure、Data
Replication and some else .
If we want to implement such a good system by ourselves without HDFS
but Solr , it's a very very complex work I think.
On Wed, 05 Sep 2007 17:18:09 +0200
Brian Carmalt [EMAIL PROTECTED] wrote:
I've bin trying to index a 300MB file to solr 1.2. I keep getting out of
memory heap errors.
Even on an empty index with one Gig of vm memory it sill won't work.
Hi Brian,
VM != heap memory.
VM = OS memory
heap
On Thu, 9 Aug 2007 15:23:03 -0700
Lance Norskog [EMAIL PROTECTED] wrote:
Underlying this all, you have a sneaky network performance problem. Your
successive posts do not reuse a TCP socket. Obvious: re-opening a new socket
each post takes time. Not obvious: your server has sockets building up
101 - 182 of 182 matches
Mail list logo