Re: Wiki for 1.3

2008-07-14 Thread Norberto Meijome
On Mon, 14 Jul 2008 23:25:25 + sundar shankar [EMAIL PROTECTED] wrote: Thanks for your patient response. I dont wanna know the classes changed, but I wanna get a hand on the wiki page for the same. I tried to search for these classes in the solr wiki. I was getting a page does not

Re: Search slow on a field with many unique values (date)

2008-07-11 Thread Norberto Meijome
On Thu, 10 Jul 2008 17:55:55 -0600 Galen Pahlke [EMAIL PROTECTED] wrote: Could this perhaps be because a date field has so many possible unique values? I don't know how to find out exactly, but I'd guess there are at least a few million unique dates in the index. Would increasing the

Re: Wiki for 1.3

2008-07-11 Thread Norberto Meijome
On Fri, 11 Jul 2008 15:22:35 + sundar shankar [EMAIL PROTECTED] wrote: I recently was looking to find details of 1.3 specific analysers and filters in the solr wiki and was unable to do so. Could anyone please point me to a place where I can find some documentation of the same. Any

Re: Automated Index Creation

2008-07-09 Thread Norberto Meijome
On Wed, 9 Jul 2008 08:48:35 +0530 Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Yes, SOLR-350 added that capability. Look at http://wiki.apache.org/solr/MultiCore for details. ahh loving SOLR more every day :P thx _ {Beto|Norberto|Numard} Meijome I used to hate

Re: Indexing xml data

2008-07-09 Thread Norberto Meijome
On Wed, 9 Jul 2008 19:51:45 +0530 Noble Paul _ __ [EMAIL PROTECTED] wrote: You can put it into a 'string' field directly if we refer to the default string field , you won't be able to search for the contents of the XML (unless you search for the whole

Re: tagging application, best way to architect?

2008-07-09 Thread Norberto Meijome
On Thu, 10 Jul 2008 09:36:01 +0530 Noble Paul _ __ [EMAIL PROTECTED] wrote: 2. We're assuming we'll have thousands of users with independent data; any good way to partition multiple indexes with solr? With Lucene we could just save those in independent

Re: Pre-processor for stored fields

2008-07-08 Thread Norberto Meijome
On Tue, 8 Jul 2008 10:20:15 -0300 Hugo Barauna [EMAIL PROTECTED] wrote: Hi, I already haved aked this, but I didn't get any good answer, so I will try again. I need to pre-process a stored field before it is saved. Just like a field that is gonna be indexed. I would be good to apply an

Re: problems with SpellCheckComponent

2008-07-08 Thread Norberto Meijome
On Tue, 8 Jul 2008 21:10:51 +0530 Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Also note that you'll need to specify spellcheck.build=true only on the first request when it will build the spell check index. The subsequent requests need not have spellcheck.build=true. as a matter of fact,

Re: Automated Index Creation

2008-07-08 Thread Norberto Meijome
On Tue, 8 Jul 2008 12:05:45 -0400 Willie Wong [EMAIL PROTECTED] wrote: I think the snapshooter will work fine for creating the indexes and then I can use the multicore capabilities to make them available to users one final question though, after snapshot has been created is there a way

Re: Some non-standard implementations

2008-07-04 Thread Norberto Meijome
On Fri, 4 Jul 2008 10:39:28 -0300 Alexander Ramos Jardim [EMAIL PROTECTED] wrote: 3. Did you mean feature 3.1. Does Solr implements that? http://wiki.apache.org/solr/SpellCheckComponent _ {Beto|Norberto|Numard} Meijome And that's one reason we like to believe in

Re: Best practices for permissions in DistrobutionScripts

2008-07-02 Thread Norberto Meijome
On Tue, 01 Jul 2008 17:04:07 +0530 Jacob Singh [EMAIL PROTECTED] wrote: a). Add jetty to a group called jetty Somehow get jetty6 to use that group Create another user (solr) and add it to the group jetty Let it run the snapshooter This seems the best option. B _

analyzer index vs query vs {missing}

2008-06-30 Thread Norberto Meijome
hi there, when defining a field type, i understand the meaning of 'analyzer type=index' , or type=query. What does it mean when the type is missing? does it apply at both index and query ? This can be found in the example's schema.xml : !-- Setup simple analysis for spell checking

Re: analyzer index vs query vs {missing}

2008-06-30 Thread Norberto Meijome
On Mon, 30 Jun 2008 05:52:33 -0400 Erik Hatcher [EMAIL PROTECTED] wrote: Yes, that's exactly what it means. Erik great, thanks for the clarification. B _ {Beto|Norberto|Numard} Meijome A dream you dream together is reality. John Lennon I speak for myself,

Re: Problems with Stored Field

2008-06-29 Thread Norberto Meijome
On Sun, 29 Jun 2008 19:40:44 -0300 Hugo Barauna [EMAIL PROTECTED] wrote: I am having problems with a stored field. The problem is that field is not being stored as I need it to be. It has a tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory, but when it is stored, that tokenizer is not

SpellCheckerRequestHandler qt parameter

2008-06-26 Thread Norberto Meijome
Hi there, Short and sweet : Is SCRH intended to honour qt= ? longer... I'm testing the newest SCRH ( SOLR-572), using last night's nightly build. I have defined a 'dismax' request handler which searches across a number of fields. When I use the SCRH in a query, and I pass the qt=dismax

SpellCheckComponent = choosing which one...

2008-06-26 Thread Norberto Meijome
Hi there, I am using the an almost default, config of spellcheck component ( details @ very end of email). I have the 3 spellcheckers defined, 'default', 'jarowinkler' and 'file'. I tried adding spellcheck.name=jarowinklerspellcheck.build=true , and with spellcheck.reload=true as well ,

Re: SpellCheckComponent = choosing which one...

2008-06-26 Thread Norberto Meijome
On Fri, 27 Jun 2008 01:44:38 +1000 Norberto Meijome [EMAIL PROTECTED] wrote: I am using the an almost default, config of spellcheck component ( details @ very end of email). I have the 3 spellcheckers defined, 'default', 'jarowinkler' and 'file'. I tried adding spellcheck.name

Re: SpellCheckerRequestHandler qt parameter

2008-06-26 Thread Norberto Meijome
On Thu, 26 Jun 2008 16:25:46 -0500 (CDT) Geoffrey Young [EMAIL PROTECTED] wrote: it seems like it ought to work as a component of your dismax handler. this works for me: [] ah i see now. cool. too bad about the crash. I don't know what the policy is for opening bugs in JIRA...should

Re: SpellCheckerRequestHandler qt parameter

2008-06-26 Thread Norberto Meijome
On Thu, 26 Jun 2008 16:25:46 -0500 (CDT) Geoffrey Young [EMAIL PROTECTED] wrote: well *almost* - it works most excellently with q=$term but when I add spellchecker.q=$term things implode: HTTP Status 500 - null java.lang.NullPointerException at

Re: How to debug ?

2008-06-25 Thread Norberto Meijome
On Wed, 25 Jun 2008 08:37:35 +0200 Brian Carmalt [EMAIL PROTECTED] wrote: There is a plugin for jetty: http://webtide.com/eclipse. Insert this as and update site and let eclipse install the plugin for you You can then start the jetty server from eclipse and debug it. Thanks Brian, good

Re: How to debug ?

2008-06-25 Thread Norberto Meijome
On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... hi Ryan, I can't see the tokens generated using LukeRequestHandler. I can get

Lucene 2.4-dev source ?

2008-06-25 Thread Norberto Meijome
Hi, where can I find these sources? I have the binary jars included with the nightly builds,but I'd like to look @ the code of some of the objects. In particular, http://svn.apache.org/viewvc/lucene/java/ doesnt have any reference to 2.4, and

Re: Lucene 2.4-dev source ?

2008-06-25 Thread Norberto Meijome
On Wed, 25 Jun 2008 20:22:06 -0400 Grant Ingersoll [EMAIL PROTECTED] wrote: Note, also, that the Manifest file in the JAR has information about the exact SVN revision so that you can check it out from there. On Jun 25, 2008, at 12:37 PM, Yonik Seeley wrote: trunk is the latest

Re: NGramTokenizer issue

2008-06-25 Thread Norberto Meijome
On Wed, 25 Jun 2008 15:37:09 -0300 Jonathan Ariel [EMAIL PROTECTED] wrote: I've been trying to use the NGramTokenizer and I ran into a problem. It seems like solr is trying to match documents with all the tokens that the analyzer returns from the query term. So if I index a document with a

Re: NGramTokenizer issue

2008-06-25 Thread Norberto Meijome
On Thu, 26 Jun 2008 10:44:32 +1000 Norberto Meijome [EMAIL PROTECTED] wrote: On Wed, 25 Jun 2008 15:37:09 -0300 Jonathan Ariel [EMAIL PROTECTED] wrote: I've been trying to use the NGramTokenizer and I ran into a problem. It seems like solr is trying to match documents with all the tokens

Re: NGramTokenizer issue

2008-06-25 Thread Norberto Meijome
On Thu, 26 Jun 2008 01:15:34 -0300 Jonathan Ariel [EMAIL PROTECTED] wrote: Ok. Played a bit more with that. So I had a difference between my unit test and solr. In solr I'm actually using a solr.RemoveDuplicatesTokenFilterFactory when querying. Tried to add that to the test, and it fails. So

several tokenizers in one field type

2008-06-24 Thread Norberto Meijome
hi all, ( I'm using 1.3 nightly build from 15th June 08.) Is there some documentation about how analysers + tokenizers are applied in fields ? In particular, my question : - If I define 2 tokenizers in a fieldtype, only the first one is applied, the other is ignored. Is that because the 2nd

Re: several tokenizers in one field type

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 00:14:57 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: best docs are here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters yes, I've been reading that already , thanks :) - If I define 2 tokenizers in a fieldtype, only the first one is applied, the

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 16:04:24 +0100 Dave Searle [EMAIL PROTECTED] wrote: At the moment I have an index of forum messages (each message being a separate doc). Results are displayed on a per message basis, however, I would like to group the results via their thread. Apart from using a facet on

Re: SOLR-139 (Support updateable/modifiable documents)

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 16:34:44 +0100 Dave Searle [EMAIL PROTECTED] wrote: I am currently storing the thread id within the message index, however, although this would allow me to sort, it doesn't help with the grouping of threads based on relevancy. See the idea is to index message data in the

How to debug ?

2008-06-24 Thread Norberto Meijome
hi, I'm trying to understand why a search on a field tokenized with the nGram tokenizer, with minGramSize=n and maxGramSize=m doesn't find any matches for queries of length (in characters) of n+1..m (n works fine). analysis.jsp shows that it SHOULD match, but /select doesn't bring anything back.

Re: How to debug ?

2008-06-24 Thread Norberto Meijome
On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... right, I will look into that a bit more. I am actually using the lukeall.jar

n-Gram, only works with queries of 2 letters

2008-06-23 Thread Norberto Meijome
hi there, my use case : I want to be able to match documents when only a partial word is provided. ie, searching for 'roc' or 'ock' should match documents containing 'rock'. As I understand, the way to solve this problem is to use the nGram tokenizer @ index time and the nGram analyser @

Re: n-Gram, only works with queries of 2 letters

2008-06-23 Thread Norberto Meijome
On Mon, 23 Jun 2008 16:23:55 +1000 Norberto Meijome [EMAIL PROTECTED] wrote: hi there, my use case : I want to be able to match documents when only a partial word is provided. ie, searching for 'roc' or 'ock' should match documents containing 'rock'. As I understand, the way to solve

Re: Wildcard search question

2008-06-23 Thread Norberto Meijome
On Mon, 23 Jun 2008 14:23:14 -0700 Jon Drukman [EMAIL PROTECTED] wrote: ok well let's say that i can live without john/jon in the short term. what i really need today is a case insensitive wildcard search with literal matching (no fancy stemming. bobby is bobby, not bobbi.) what are my

Re: n-Gram, only works with queries of 2 letters

2008-06-23 Thread Norberto Meijome
On Mon, 23 Jun 2008 05:33:49 -0700 (PDT) Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, When you add debugQuery=true to the request, what does your query look like after parsing? BTW, I've tested same data + similar config using EdgeNGramTokenizer and this works properly - I can

Cost of having fieldTypes defined but not used

2008-06-23 Thread Norberto Meijome
Hi all, I'm curious , what is the cost (memory / processing time @ load? performance hit ? ) of having several unused fieldTypes defined in schema.xml ? cheers, B _ {Beto|Norberto|Numard} Meijome Egotism is the anesthetic that dulls the pain of stupidity. Frank Leahy

Re: Dismax + Dynamic fields

2008-06-17 Thread Norberto Meijome
On Mon, 16 Jun 2008 14:22:12 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: There are two levels of dynamic field support. Specific dynamic fields can be queried with dismax, but you can't wildcard the qf or other field parameters. Thanks Yonik. ok, that matches what I've seen - if i know the

Dismax + Dynamic fields

2008-06-16 Thread Norberto Meijome
Hi everyone, I just wanted to confirm that dynamic fields cannot be used with dismax By this I mean that the following : schema.xml [...] dynamicField name=dyn_1_* type=text indexed=true stored=true required=false / [..] solrconfig.xml [..] requestHandler name=dismax1

Re: doubt with an index of 300gb

2008-06-15 Thread Norberto Meijome
On Sun, 15 Jun 2008 14:38:15 +0200 Roberto Nieto [EMAIL PROTECTED] wrote: Hi Otis, Thanks a lot for your interest. The main thing i cant understand very well is that if I have 8 maquines that will be searchers, for example, why they will have a higher cost of hw if I have one big index.

Re: Some advice on scalability

2008-05-18 Thread Norberto Meijome
On Thu, 15 May 2008 12:54:25 -0700 (PDT) Otis Gospodnetic [EMAIL PROTECTED] wrote: 5) Hardware recommendations are hard to do. While people may make suggestions, the only way to know how *your* hardware works with *your* data and *your* shards and *your* type of queries is by benchmarking.

Re: Some advice on scalability

2008-05-18 Thread Norberto Meijome
On Thu, 15 May 2008 09:23:03 -0700 William Pierce [EMAIL PROTECTED] wrote: [...] Our app in brief: We get merchant sku files (in either xml/csv) which we process and index and make available to our site visitors to search. Our current plan calls for us to support approx 10,000 merchants

Re: MultiLingual Search

2008-05-12 Thread Norberto Meijome
On Mon, 12 May 2008 16:16:28 +0530 Sachit P. Menon [EMAIL PROTECTED] wrote: My project requires having the same content (mostly) in multiple languages. hi Sachit, please search the archives of the list. this topic seems to come up twice a week or thereabouts :) You are of course encouraged

Re: using solr as master for data storage/retrieval?

2008-05-08 Thread Norberto Meijome
On Wed, 7 May 2008 11:26:50 -0400 (EDT) Phillip Rhodes [EMAIL PROTECTED] wrote: I currently have a java-based application that stores all objects on the file system (text, blobs) and uses lucene to search the objects. If I can store these objects in solr, I would greatly increase the

Re: using solr as master for data storage/retrieval?

2008-05-08 Thread Norberto Meijome
On Thu, 8 May 2008 09:24:45 -0400 (EDT) Phillip Rhodes [EMAIL PROTECTED] wrote: B, My thoughts are coming from experience while writing and using stitches. Stitches is a java-based project that allows local and remote java clients (using hessian for java, xfire for dotnet) to search,

Re: Index splitting

2008-04-29 Thread Norberto Meijome
On Tue, 29 Apr 2008 10:10:09 +0200 Nico Heid [EMAIL PROTECTED] wrote: So now the Question: Is there a way to split a too big index into smaller ones? Do I have to create more instances at the beginning, so that I will not run out of power and space? (which will ad quite a bit of redundance of

Re: indexing slow, IO-bound?

2008-04-09 Thread Norberto Meijome
On Mon, 7 Apr 2008 16:37:48 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: On Mon, Apr 7, 2008 at 4:30 PM, Mike Klaas [EMAIL PROTECTED] wrote: 'top', 'vmstat' tell exactly what's going on in terms of io and cpu on unix. Perhaps someone has gotten these to work under windows with cygwin.

Re: Date range performance

2008-04-03 Thread Norberto Meijome
On Thu, 3 Apr 2008 18:14:56 -0300 Jonathan Ariel [EMAIL PROTECTED] wrote: I'm experiencing a really poor performance when using date ranges in solr query. Is it a know issue? is there any special consideration when using date ranges? It seems weird because I always thought date dates are

Re: searching like RDBMS way

2008-04-02 Thread Norberto Meijome
On Wed, 2 Apr 2008 15:31:43 -0500 [EMAIL PROTECTED] wrote: This is very general requirement and I am sure somebody might have thought about the solution. Hi Sunil, - please don't hijack the thread :) - why don't you use the right tool for the problem? from what you said, a RDBMS sounds like

Re: What are the limits? Billions of records anyone?

2008-03-25 Thread Norberto Meijome
On Mon, 24 Mar 2008 22:58:18 -0700 (PDT) Vinci [EMAIL PROTECTED] wrote: *Hadoop is more focusing on the disturbuted crawler as far I know... Hadoop is distributed processing based on the MapReduce algorithm/approach. Nutch is a lucene related project that uses Hadoop for the crawler and

Re: Help Requested

2008-03-24 Thread Norberto Meijome
On Thu, 20 Mar 2008 09:07:08 -0700 (PDT) Raghav Kapoor [EMAIL PROTECTED] wrote: [...] Any particular reason why need the server in this situation? pretty much everything you are doing can be done locally. Except, probably, cross linking between client's documents. I have no idea in

Re: Help Requested

2008-03-20 Thread Norberto Meijome
On Wed, 19 Mar 2008 21:22:42 -0700 (PDT) Raghav Kapoor [EMAIL PROTECTED] wrote: I am new to Solr and I am facing a question if solr can be helpful in a project that I'm working on. welcome :) The project is a client/server app that requires a client app to index the documents and send the

Re: RAM Based Index for Solr

2008-03-20 Thread Norberto Meijome
On Wed, 19 Mar 2008 17:04:34 -0700 (PDT) swarag [EMAIL PROTECTED] wrote: In Lucene there is a Ram Based Index org.apache.lucene.store.RAMDirectory. Is there a way to setup my index in solr to use a RAMDirectory? create a mountpoint on a ramdrive (tmpfs in linux, i think), and put your index

Re: Composite key for uniqueKeyId

2008-03-10 Thread Norberto Meijome
On Fri, 7 Mar 2008 17:59:48 -0800 (PST) Chris Hostetter [EMAIL PROTECTED] wrote: I believe Norberto ment he was handling it in his update client code -- before sending the docs to Solr. Indeed, this what we do. We have a process that parses certain files, generates documents following the

Re: Composite key for uniqueKeyId

2008-03-06 Thread Norberto Meijome
On Thu, 6 Mar 2008 11:33:38 -0500 Jon Baer [EMAIL PROTECTED] wrote: Im interested to know if composite keys are now possible or if there is anything to copyField I can use to get composite keys working for my doc ids? FWIW, we just do this @ doc generation time - grab several fields,

Re: How long does optimize take on your Solr installation?

2008-03-02 Thread Norberto Meijome
On Fri, 29 Feb 2008 13:02:21 -0500 Yonik Seeley [EMAIL PROTECTED] wrote: On Fri, Feb 29, 2008 at 12:45 AM, Walter Underwood [EMAIL PROTECTED] wrote: Good point. My numbers are from a full rebuild. Let's collect maximum times, to keep it simple. --wunder You may see more variation than

Re: Transform Update responses with XSLT?

2008-02-20 Thread Norberto Meijome
On Fri, 15 Feb 2008 11:09:45 +0100 Maximilian Hütter [EMAIL PROTECTED] wrote: Hi, is there a way to transform a Solr update response with a XSLT-Stylesheet? It looks like the XSLTResponseWriter is only used for searches. Best regards, Max Hi Maximilian, yes, it is definitely

Re: conceptual issues with solr

2008-01-16 Thread Norberto Meijome
On Wed, 16 Jan 2008 16:54:56 +0100 Philippe Guillard [EMAIL PROTECTED] wrote: Hi here, It seems that Lucene accepts any kind of XML document but Solr accepts only flat name/value pairs inside a document to be indexed. You'll find below what I'd like to do, Thanks for help of any kind !

Re: How to configure Solr on Tomcat 6.0 as windows Service

2008-01-02 Thread Norberto Meijome
On Wed, 2 Jan 2008 16:25:58 +0530 Laxmilal Menaria [EMAIL PROTECTED] wrote: I have tried Solr using jetty, its run on command prompt, but now I want to comfigure solr on tomcat-6, so nay one know how to configure it as windows service using tomcat. Any particular reason you don't use Jetty as

Re: Solr 1.3 expected release date

2007-12-13 Thread Norberto Meijome
On Wed, 12 Dec 2007 20:04:00 -0500 Norskog, Lance [EMAIL PROTECTED] wrote: ... SOLR-303 (Distributed Search over HTTP)... Woo-hoo! hear hear!!! _ {Beto|Norberto|Numard} Meijome Your reasoning is excellent -- it's only your basic assumptions that are wrong. I speak

Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-27 Thread Norberto Meijome
On Tue, 27 Nov 2007 18:18:16 +0100 Siegfried Goeschl [EMAIL PROTECTED] wrote: Hi folks, working on a closed source project for an IP concerned company is not always fun ... we combined SOLR with JAMon (http://jamonapi.sourceforge.net/) to keep an eye of the query times and this might be

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Norberto Meijome
On Tue, 27 Nov 2007 18:12:13 -0500 Brian Whitman [EMAIL PROTECTED] wrote: On Nov 27, 2007, at 6:08 PM, bbrown wrote: I couldn't tell if this was asked before. But I want to perform a nutch crawl without any solr plugin which will simply write to some index directory. And then

Re: Solr on Windows / Linux

2007-11-25 Thread Norberto Meijome
On Fri, 23 Nov 2007 21:37:14 -0800 (PST) Chris Hostetter [EMAIL PROTECTED] wrote: 2) the issue with replication/distribution and windows isn't rsync (that is available as part of cygwin) the issue relates to the fact that even though windows has hardlinks, you can't move a hard link to a

Re: Heritrix and Solr

2007-11-22 Thread Norberto Meijome
On Thu, 22 Nov 2007 10:41:41 -0500 George Everitt [EMAIL PROTECTED] wrote: After a lot of googling, I came across Heritrix, which seems to be the most robust well supported open source crawler out there. Heritrix has an integration with Nutch (NutchWax), but not with Solr. I'm

Re: Heritrix and Solr

2007-11-22 Thread Norberto Meijome
On Thu, 22 Nov 2007 19:10:46 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: The answer to that question, Norberto, would depend on versions. Otis, would that relate to what underlying version of Lucene is being used in either Solr Nutch? _

Re: facet - associated fields

2007-11-20 Thread Norberto Meijome
On Tue, 20 Nov 2007 17:39:58 -0500 Jae Joo [EMAIL PROTECTED] wrote: Hi, Can anyone help me how to facet and/or search for associated fields? - http://wiki.apache.org/solr/SimpleFacetParameters _ {Beto|Norberto|Numard} Meijome Fear not the path of truth for the lack

Re: multiple delete by id in one delete command?

2007-11-18 Thread Norberto Meijome
On Mon, 19 Nov 2007 16:53:17 +1100 climbingrose [EMAIL PROTECTED] wrote: The easiest solution I know is: deletequeryid:1 OR id:2 OR .../query/delete If you know that all of these ids can be found by issuing a query, you can do delete by query: deletequeryYOUR_DELETE_QUERY_HERE/query/delete

Re: 2GB limit on 32 bits

2007-11-09 Thread Norberto Meijome
On Fri, 9 Nov 2007 09:03:01 -0300 Isart Montane [EMAIL PROTECTED] wrote: I've read there's a kernel limitation for a 32 bits architecture of 2Gb per process, and i just wanna know if anybody knows an alternative to get a new 64bits server. You don't say what CPU you have. But the 32 bit limit

Re: 2GB limit on 32 bits

2007-11-09 Thread Norberto Meijome
On Fri, 9 Nov 2007 10:30:16 -0300 Isart Montane [EMAIL PROTECTED] wrote: I've got a dual Xeon. Here you are my cpuinfo. I've read the limit on a 2.6linux kernel is 4GB on user space and 4GB for kernel... that's why I asked if there's any way to reach 4GB per process. ok - i'm obviously too

Re: 2GB limit on 32 bits

2007-11-09 Thread Norberto Meijome
On Fri, 9 Nov 2007 11:58:53 -0300 Isart Montane [EMAIL PROTECTED] wrote: More info. The kernel is compiled with HIGHMEM64 and PAE Sorry, i havent dealt with linux kernel options for years. PAE will give you 36 bits of address. but if the kernel is still limiting the user space to 2 GB /

Re: What is the best way to index xml data preserving the mark up?

2007-11-07 Thread Norberto Meijome
On Wed, 7 Nov 2007 20:18:25 -0800 (PST) David Neubert [EMAIL PROTECTED] wrote: I am sure this is 101 question, but I am bit confused about indexing xml data using SOLR. I have rich xml content (books) that need to searched at granular levels (specifically paragraph and sentence levels

Re: Availability Issues

2007-10-11 Thread Norberto Meijome
On Tue, 9 Oct 2007 10:12:51 -0400 David Whalen [EMAIL PROTECTED] wrote: So, how would you build it if you could? Here are the specs: a) the index needs to hold at least 25 million articles b) the index is constantly updated at a rate of 10,000 articles per minute c) we need to have

Re: Solr live at Netflix

2007-10-02 Thread Norberto Meijome
On Tue, 02 Oct 2007 15:26:33 -0700 Walter Underwood [EMAIL PROTECTED] wrote: Here at Netflix, we switched over our site search to Solr two weeks ago. We've seen zero problems with the server. We average 1.2 million queries/day on a 250K item index. We're running four Solr servers with simple

Re: How can i make a distribute search on Solr?

2007-09-20 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:58:17 +0200 David Welton [EMAIL PROTECTED] wrote: That seems to be how Sphinx works: http://www.sphinxsearch.com/doc.html#distributed Of course, the details of this are far over my head for either system, so I don't really know if that's a sensible way of doing

Re: How can i make a distribute search on Solr?

2007-09-20 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:53:46 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? Not really... you could force a *lot* of different problems into map-reduce

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley [EMAIL PROTECTED] wrote: Stu is referring to Federated Search - where each index has some of the data and results are combined before they are returned. This is not yet supported out of the box Maybe this is related. How does this compare to

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 10:29:54 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? map-reduce is more for batch jobs. Nutch only uses map-reduce for parallel indexing, not searching. I see... so in

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:37:51 +0800 Jarvis [EMAIL PROTECTED] wrote: If we use the RPC call in nutch . Hi, I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this league to be suggesting architecture stuff :) but i imagine there's nothing wrong with using what they've built

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:02:08 +0800 Jarvis [EMAIL PROTECTED] wrote: You can see the code in org.apache.nutch.searcher.NutchBean class . :) thx for the pointer. _ {Beto|Norberto|Numard} Meijome In order to avoid being called a flirt, she always yielded easily. Charles,

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:21:39 +0800 Jarvis [EMAIL PROTECTED] wrote: What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think.

Re: Indexing very large files.

2007-09-05 Thread Norberto Meijome
On Wed, 05 Sep 2007 17:18:09 +0200 Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. Even on an empty index with one Gig of vm memory it sill won't work. Hi Brian, VM != heap memory. VM = OS memory heap

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Norberto Meijome
On Thu, 9 Aug 2007 15:23:03 -0700 Lance Norskog [EMAIL PROTECTED] wrote: Underlying this all, you have a sneaky network performance problem. Your successive posts do not reuse a TCP socket. Obvious: re-opening a new socket each post takes time. Not obvious: your server has sockets building up

<    1   2