Re: Question on StreamingUpdateSolrServer

2009-04-14 Thread vivek sar
The machine's ulimit is set to 9000 and the OS has upper limit of 12000 on files. What would explain this? Has anyone tried Solr with 25 cores on the same Solr instance? Thanks, -vivek 2009/4/13 Noble Paul നോബിള്‍ नोब्ळ् : > On Tue, Apr 14, 2009 at 7:14 AM, vivek sar wrote: >> Some more update.

Re: indexing txt file

2009-04-14 Thread Alejandro Gonzalez
you should construct the xml containing the fields defined in your schema.xml and give them the values from the text files. for example if you have an schema defining two fields "title" and "text" you should construct an xml with a field "title" and its value and another called "text" containing th

Re: solr 1.4 memory jvm

2009-04-14 Thread sunnyfr
do you have an idea? sunnyfr wrote: > > Hi Noble, > > Yes exactly that, > I would like to know how people do during a replication ? > Do they turn off servers and put a high autowarmCount which turn off the > slave for a while like for my case, 10mn to bring back the new index and > then autow

Re: commit / new searcher delay?

2009-04-14 Thread sunnyfr
Hi Hossman, I would love to know either how do you manage this ? thanks, Shalin Shekhar Mangar wrote: > > On Fri, Mar 6, 2009 at 8:47 AM, Steve Conover wrote: > >> That's exactly what I'm doing, but I'm explicitly replicating, and >> committing. Even under these circumstances, what could

Boolean query in Solr

2009-04-14 Thread Sagar Khetkade
Hi, I am using SolrJ and firing the query on Solr indexes. The indexed contains three fields viz. 1. Document_id (type=integer required= true) 2. Ticket Id (type= integer) 3. Content (type=text) Here the query formulation is such that I am having query with “AND” clause. So t

Re: solr 1.4 memory jvm

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
We do not have such high update frequency. So we never encountered this problem. If it is possible to take the slave offline during auto warming that is a good solution. --Noble On Thu, Apr 9, 2009 at 2:02 PM, sunnyfr wrote: > > Hi Noble, > > Yes exactly that, > I would like to know how people do

Re: Search included in *all* fields

2009-04-14 Thread Erik Hatcher
Or in schema.xml you can set the defaultOperator to AND: which applies only to the Lucene/SolrQueryParser, not dismax. Erik On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote: what about: fieldA:value1 AND fieldB:value2 this can also be written as: +fieldA:value1 +fieldB:value2

Re: Use more then one tag with Dataimporthandler ?

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
nope, but it is possible to have multiple root entities within a document and you can execute one at a time. --Noble On Tue, Apr 14, 2009 at 4:15 PM, gateway0 wrote: > > Hi, > > is it possible to use more than one tag within my data-config.xml > file? > > Like: > > > url="jdbc:mysql://localh

Re: Search included in *all* fields

2009-04-14 Thread Johnny X
Cheers guys, got it working! Erik Hatcher wrote: > > Or in schema.xml you can set the defaultOperator to AND: > which applies only to the > Lucene/SolrQueryParser, not dismax. > > Erik > > On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote: > >> what about: >> fieldA:value1 AND fie

Re: indexing txt file

2009-04-14 Thread Erik Hatcher
On Apr 14, 2009, at 2:01 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: what is the cntent of your text file? Solr does not directly index file Solr's ExtractingRequestHandler (aka Solr Cell) does index text (and Word, PDF, etc) files directly. This is a Solr 1.4/trunk feature. Erik

Re: Using ExtractingRequestHandler to index a large PDF ~solved

2009-04-14 Thread Fergus McMenemie
>On Apr 6, 2009, at 10:16 AM, Fergus McMenemie wrote: > >> Hmmm, >> >> Not sure how this all hangs together. But editing my solrconfig.xml >> as follows >> sorted the problem:- >> >>> multipartUploadLimitInKB="2048" /> >> to >> >>> multipartUploadLimitInKB="20048" /> >> > >We should docum

Customizing solr with my lucene

2009-04-14 Thread mirage1987
hey, I am trying to modify the lucene code by adding payload functionality into it. Now if i want to use this lucene with solr what should i do. I have added this to the lib folder of solr.war replacing the old lucene..Is this enough?? Plus i am also using a different schema than the default

Re: Maintaining XML Layout

2009-04-14 Thread Johnny X
Pre tag fixed it instantly! Thanks! Shalin Shekhar Mangar wrote: > > On Tue, Apr 14, 2009 at 4:56 PM, Johnny X > wrote: > >> >> Hey, >> >> >> One of the fields returned from my queries (Content) is essentially the >> body >> of an e-mail. However, it's returned as one long stream of text

Maintaining XML Layout

2009-04-14 Thread Johnny X
Hey, One of the fields returned from my queries (Content) is essentially the body of an e-mail. However, it's returned as one long stream of text (or at least, that's how it appears on the web page). Viewing the source of the page it appears with the right layout characteristics (paragraphs, nam

Re: Maintaining XML Layout

2009-04-14 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 4:56 PM, Johnny X wrote: > > Hey, > > > One of the fields returned from my queries (Content) is essentially the > body > of an e-mail. However, it's returned as one long stream of text (or at > least, that's how it appears on the web page). Viewing the source of the > page

Use more then one tag with Dataimporthandler ?

2009-04-14 Thread gateway0
Hi, is it possible to use more than one tag within my data-config.xml file? Like: ...entities ...entities ??? kind regards, Sebastian -- View this message in context: http://www.nabble.com/Use-more-then-one-%3Cdocument%3E-tag-with-Dataimporthandler---tp23037189p230

Re: Random queries extremely slow

2009-04-14 Thread sunnyfr
Hi Oleg Did you find a way to pass over this issue ?? thanks a lot, oleg_gnatovskiy wrote: > > Can you expand on this? Mirroring delay on what? > > > > zayhen wrote: >> >> Use multiple boxes, with a mirroring delaay from one to another, like a >> pipeline. >> >> 2009/1/22 oleg_gnatovskiy

Re: Boolean query in Solr

2009-04-14 Thread Erik Hatcher
On Apr 14, 2009, at 5:38 AM, Sagar Khetkade wrote: Hi, I am using SolrJ and firing the query on Solr indexes. The indexed contains three fields viz. 1. Document_id (type=integer required= true) 2. Ticket Id (type= integer) 3. Content (type=text) Here the query formulation

Re: Customizing solr with my lucene

2009-04-14 Thread Erik Hatcher
What is the query parsed to? Add &debugQuery=true to your Solr request and let us know what the query parses to. As for whether upgrading a Lucene library is sufficient... depends on what Solr version you're starting with (payload support is already in all recent versions of Solr's Lucene

Re: synchronizing slave indexes in distributing collections

2009-04-14 Thread sunnyfr
Hi, I would like to know where are you about your script which take the slave out of the load balancer ?? I've no choice to do that during update on the slave server. Thanks, Yu-Hui Jin wrote: > > Thanks, guys. > > Glad to know the scripts work very well in your experience. (well, indeed >

Disable logging in SOLR

2009-04-14 Thread Kraus, Ralf | pixelhouse GmbH
Hi, is there a way to disable all logging output in SOLR ? I mean the output text like : "INFO: [core_de] webapp=/solr path=/update params={wt=json} status=0 QTime=3736" greets -Ralf-

RE: Term Counts/Term Frequency Vector Info

2009-04-14 Thread Fink, Clayton R.
Grant, This works: String url = "http://localhost:8983/solr";; SolrServer server = new CommonsHttpSolrServer(url); SolrQuery query = new SolrQuery(); query.setQueryType("/autoSuggest"); query.setParam("terms", "true"); query.setParam("terms.fl", "CONTENTS"); query.setParam("terms.lower", "london"

Embedded Solr weird behaviour

2009-04-14 Thread Adrian Ivan
Hello, I am using both Solr server and Solr embedded versions in the same context. I am using the Solr Server for indexing data which can be accessed at enterprise level, and the embedded version in a desktop application. The idea is that both index the same data, have the same schema.xml and

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, "Walter Underwood" wrote: > Restarting Solr fixes it. If I remember correctly, a sync and commit > does not fix it. I have disabled snappu

Re: Memory usage

2009-04-14 Thread Mark Miller
Could you give us a dump of http://localhost:port/solr/admin/luke ? A huge max field length and random terms in 2000 2 MB files is going to be a bit of a resource hog :) Can you explain why you are doing that? You will have *so* many unique terms... I can't remember if you can set it in So

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Yonik Seeley
It just occurred to me that a query cache issue could potentially cause this... if it's caching it would most likely be a query.equals() implementation incorrectly returning true. Perhaps check the JaroWinkler.equals() first? Also, when one server starts to return bad results, have you tried using

Re: indexing txt file

2009-04-14 Thread Alex Vu
Hi all, I'm trying to use solr1.3 and trying to index a text file. I wrote a schema.xsd and a xml file. *The content of my text file is * #src dstprotook sportdportpktsbytesflowsfirst atest 192.168.2

DIH & uniqueKey

2009-04-14 Thread ashokc
Hi, I have separate JDBC datasources (DS1 & DS2) that I want to index with DIH in a single SOLR instance. The unique record for the two sources are different. Do I have to synthesize a uniqueKey that spans both the datasources? Something like this? That is, the uniqueKey values will be like (+ in

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? I could run with a cache size of 0, since our middle tier HTTP cache is leaving almost nothing for the caches to do. I'll try that explain. The stored fields fo

Re: indexing txt file

2009-04-14 Thread Alejandro Gonzalez
now you should post (http post) your xml file (the schema must be in conf folder) to the url in wich it's supossed you have deployed Solr. Don forget to post a commit command after that or you won't see the results: The commit command it's just an xml this way: On Tue, Apr 14, 2009 at 6:14 PM,

Re: indexing txt file

2009-04-14 Thread Alex Vu
what about the text file? On Tue, Apr 14, 2009 at 9:23 AM, Alejandro Gonzalez < alejandrogonzalezd...@gmail.com> wrote: > now you should post (http post) your xml file (the schema must be in conf > folder) to the url in wich it's supossed you have deployed Solr. Don forget > to post a commit comm

Re: indexing txt file

2009-04-14 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu wrote: > > *schema file is * > > > http://www.w3.org/2001/XMLSchema";> > > > > > > type="xs:string" use="required"/> > use="required"/> >

Re: indexing txt file

2009-04-14 Thread Alejandro Gonzalez
and i'm not sure of understanding what are u trying to do, but maybe you should define a text field and fill it with the text in each file for indexing the text in them, or maybe a path to that file if that's what u want. On Tue, Apr 14, 2009 at 6:28 PM, Shalin Shekhar Mangar < shalinman...@gmail.

Re: indexing txt file

2009-04-14 Thread Alex Vu
I also wrote another schema file that is supplied by Solr, I do have some questions. *The content of my text file is * #src dstprotook sportdportpktsbytesflowsfirst latest 192.168.220.13526.147.238.

Re: Random queries extremely slow

2009-04-14 Thread oleg_gnatovskiy
It was actually our use of the field collapse patch. Once we disabled this the random slow queries went away. We also added *:* as a warmup query in order to speed up performance after indexing. sunnyfr wrote: > > Hi Oleg > > Did you find a way to pass over this issue ?? > thanks a lot, >

Re: DIH & uniqueKey

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
use TemplateTransformer to create a key On Tue, Apr 14, 2009 at 9:49 PM, ashokc wrote: > > Hi, > > I have separate JDBC datasources (DS1 & DS2) that I want to index with DIH > in a single SOLR instance. The unique record for the two sources are > different. Do I have to synthesize a uniqueKey tha

Re: indexing txt file

2009-04-14 Thread Alex Vu
I just want to be able to index my text file, and other files that carries the same format but with different IP address, ports, ect. I will have the traffic flow running in real-time. Do you think Solr will be able to index a bunch of my text files in real time? On Tue, Apr 14, 2009 at 9:35 AM

Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Glen Newton
I was wondering if those more up on SolrJ internals could take a look if there were any serious gotchas with the AppEngine's Java urlfetch with respect to SolrJ. http://code.google.com/appengine/docs/java/urlfetch/overview.html "The URL must use the standard ports for HTTP (80) and HTTPS (443). Th

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Yonik Seeley
On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wrote: > The JaroWinkler equals was broken, but I fixed that a month ago. > > Query cache sounds possible, but those are cleared on a commit, > right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equ

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, "Yonik Seeley" wrote: > On Tue, Apr 14, 2009 at 12:19 PM, Walter

solr 1.3 + tomcat 5.5

2009-04-14 Thread andrysha nihuhoid
Hi, got problem setting up solr + tomcat Tomcat5.5 + apache solr 1.3.0 + centos 5.3 I don't familiar with java at all, so sorry if it's dumb question. Here is what i did: placed solr.war in webapps folder changed solr home to /etc/solr copied contents of solr distribution example folder to /etc/sol

Re: Analyzers and stemmer

2009-04-14 Thread Grant Ingersoll
I would say a language is supported if there is a Tokenizer available for it. Everything else after that is generally seen as an improvement. On Apr 9, 2009, at 5:26 AM, revas wrote: Hi , With respect to language support in solr ,we have analyzers for some languages and stemmers for certa

Re: Multi-language support

2009-04-14 Thread Grant Ingersoll
On Apr 9, 2009, at 7:09 AM, revas wrote: Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. C

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Grant Ingersoll
Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the ca

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, "Grant Ingersoll" wrote: > Are there changes occuring when it goes bad that maybe aren't committed? > > On Apr 14, 2009, at 1:59 PM, Walter Un

Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Smiley, David W.
SolrJ would require some modification. SolrJ internally uses Jakarta HTTP Client via Solr's "CommonsHttpSolrServer" class. It would need to be ported to a different implementation of SolrServer (the base class), one that uses java.net.URL. I suggest "JavaNetUrlHttpSolrServer". ~ David Smiley

How to manage real-time (presence) data in a large index?

2009-04-14 Thread Development Team
Hi everybody, I have a relatively large index (it will eventually contain ~4M documents and be about 3G in size, I think) that indexes user data, settings, and the like. The documents represent a community of users whereupon a subset of them may be "online" at any time. Also, we want to scor

Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Glen Newton
I see. So this is a show stopper for those wanting to use SolrJ with AppEngine. Any chance this could be added as a Solr issue? -glen 2009/4/14 Smiley, David W. : > SolrJ would require some modification. SolrJ internally uses Jakarta HTTP > Client via Solr’s “CommonsHttpSolrServer” class. It w

Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Shalin Shekhar Mangar
On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton wrote: > I see. So this is a show stopper for those wanting to use SolrJ with > AppEngine. > > Any chance this could be added as a Solr issue? > > Yes, commons-httpclient tries to use Socket directly. So it may not work. It was mentioned here - http:

Distinct terms in facet field

2009-04-14 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
How could I get a count of distinct terms for a given query? For example: The Wiki page http://wiki.apache.org/solr/SimpleFacetParameters has a section "Facet Fields with No Zeros" which shows the query: http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet.limit=-1&facet.field=cat&face

Hierarchal Faceting Field Type

2009-04-14 Thread Nasseam Elkarra
Background: Set up a system for hierarchal categories using the following scheme: level one# level one#level two# level one#level two#level three# Trying to find the right combination of field type and query to get the desired results. Saw some previous posts about hierarchal facets which hel

Re: Search on all fields and know in which field was the match

2009-04-14 Thread Chris Hostetter
: With this structure i think (correct me if i am wrong) i cant search for all : attachBody_* and know where the match was (attachBody_1, _2, _3, etc). correct : I really don't know if this is the best approach so any help would be : appreciated. one option is to index each attachemnt as it's

Re: How to send a parsed Query to shards?

2009-04-14 Thread Chris Hostetter
: reference some large in-memory lookup tables. After the search components : get done processing the orignal query, the query may contain SpanNearQueries : and DisjunctionMaxQueries. I'd like to send that query to the shards, not : the original query. : : I've come up with the following idea

Re: Custom sort based on arbitrary order

2009-04-14 Thread Chris Hostetter
: custom order that is fairly simple: there is a list of venues and some of : them are more relevant than others (there is no logic, it's arbitrary, it's : not an alphabetic order), it'd be something like this: : : Orange venue = 1 : Red venu = 2 : Blue venue = 3 : : So results where venue is "o

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Grant Ingersoll
Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once

Re: using NGramTokenizerFactory for partial matching

2009-04-14 Thread Chris Hostetter
: I want it to match "lor" "lorem" and "lorem i". However I am finding it : matches the first two but not the third - the white space is causing : problems. Here are the relevant parts of my config: : : : : NGramTokenizer doesn't do anything special wit

Sort by distance from location?

2009-04-14 Thread Development Team
Hi everybody, My index has latitude/longitude values for locations. I am required to do a search based on a set of criteria, and order the results based on how far the lat/long location is to the current user's location. Currently we are emulating such a search by adding criteria of ever-widen

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Walter Underwood
I already ruled out cosmic rays. It has happened on different hardware and at different times of day, including low load. The only thing associated with it is load from a new faceted browse thing we turned on. wunder On 4/14/09 2:23 PM, "Grant Ingersoll" wrote: > Is bad memory a possibility?

Re: Sort by distance from location?

2009-04-14 Thread Smiley, David W.
Have you tried LocalSolr? http://www.gissearch.com/localsolr (I haven't but looks cool) On 4/14/09 5:31 PM, "Development Team" wrote: Hi everybody, My index has latitude/longitude values for locations. I am required to do a search based on a set of criteria, and order the results based on

Re: Sort by distance from location?

2009-04-14 Thread Development Team
Ah, good question: Yes, we've tried it... and it was slower. To give some avg times: Regular non-distance Searches: 100ms Our expanding-criteria solution: 600ms LocalSolr: 800ms (We also had problems with LocalSolr in that the results didn't seem to be cached in Solr upon doing a search. So eac

Re: Disable logging in SOLR

2009-04-14 Thread Bill Au
Have you tried setting logging level to OFF from Solr's admin GUI: http://wiki.apache.org/solr/SolrAdminGUI Bill On Tue, Apr 14, 2009 at 9:56 AM, Kraus, Ralf | pixelhouse GmbH < r...@pixelhouse.de> wrote: > Hi, > > is there a way to disable all logging output in SOLR ? > I mean the output text l

Re: _val:ord(field) (from wiki LargeIndexes)

2009-04-14 Thread Chris Hostetter
: I see this interesting line in the wiki page LargeIndexes : http://wiki.apache.org/solr/LargeIndexes (sorting section towards the : bottom) : : Using _val:ord(field) as a search term will sort the results without : incurring the memory cost. : : I'd like to know what this means, but I'm hav

Re: More than one language in the same document

2009-04-14 Thread Chris Hostetter
: > A related question. What does 'copyField' actually do? Does it 'append' : > content from the source field to the 'target' field? Or does it : > replace/overwrite it? Thank you. : > : > : It appends the content of the source field to the target. strictly speaking, it adds the content to th

Re: Hierarchal Faceting Field Type

2009-04-14 Thread Koji Sekiguchi
Nasseam Elkarra wrote: Background: Set up a system for hierarchal categories using the following scheme: level one# level one#level two# level one#level two#level three# Trying to find the right combination of field type and query to get the desired results. Saw some previous posts about hierar

Re: using multisearcher

2009-04-14 Thread Chris Hostetter
: As for the second part, I was thinking of trying to replace the standard : SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very : familiar with the workings of Solr, especially with respect to the caching : that goes on. I thought that maybe people who are more familiar wi

Re: Access HTTP headers from custom request handler

2009-04-14 Thread Chris Hostetter
: Solr cannot assume that the request would always come from http (think : of EmbeddedSolrServer) .So it assumes that there are only parameters exactly. : Your best bet is to modify SolrDispatchFilter and readthe params and : set them in the SolrRequest Object SolrDispatchFilter is designed to

Index Replication or Distributed Search ?

2009-04-14 Thread ramanathan
Hi, Can someone provide a practical advice of how large a Solr search index can be? for a better performance for consumer facing media website?. Is it good or bad to think about Distributed Search and dividing index in earlier stage of development? Thanks Ram -- View this message in context:

Re: Help with relevance failure in Solr 1.3

2009-04-14 Thread Grant Ingersoll
OK, I guess details on the new faceting stuff would be in order. Which faceting are using? Are you sure that it never occurred before (i.e. it slipped under the radar)? Obviously, the key is reproducibility here, but this has all the earmarks of some weird threading issue, it seems, at le

Re: How to manage real-time (presence) data in a large index?

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Apr 15, 2009 at 12:39 AM, Development Team wrote: > Hi everybody, >       I have a relatively large index (it will eventually contain ~4M > documents and be about 3G in size, I think) that indexes user data, > settings, and the like. The documents represent a community of users > whereupon

Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess SOLR-599 can be easily fixed if we do not implement Multipart-support (which is non-essential) --Noble On Wed, Apr 15, 2009 at 1:12 AM, Shalin Shekhar Mangar wrote: > On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton wrote: > >> I see. So this is a show stopper for those wanting to use SolrJ