CachedSqlEntityProcessor's purpose

2008-11-25 Thread Amit Nithian
I am starting to look at Solr's Data Import Handler framework and am quite impressed with it so far. My question is in trying to reduce the number of SQL queries issued to the database and saw this entity processor. In the following example: entity name=x query=select * from x entity name=y

Unknown field error using JDBC

2008-11-25 Thread Joel Karlsson
Hello, I get Unknown field error when I'm indexing an Oracle dB. I've reduced the number of fields/columns in order to troubleshoot. If I change the uniqeKey to timestamp (for example) and create a dynamic field dynamicField name=* type=text indexed=true stored=true the indexing works fine,

Re: Using Solr for indexing emails

2008-11-25 Thread Norberto Meijome
On Tue, 25 Nov 2008 03:59:31 +0200 Timo Sirainen [EMAIL PROTECTED] wrote: would it be faster to say q=user:user AND highestuid:[ * TO *] ? Now that I read again what fq really did, yes, sounds like you're right. you may want to compare them both to see which one is better... I just went

Re: solr internationalization support

2008-11-25 Thread Shalin Shekhar Mangar
On Mon, Nov 24, 2008 at 7:56 PM, rameshgalla [EMAIL PROTECTED]wrote: 1)Which languages solr supports out-of-the box other than english? Solr does not know about any languages. It will apply whatever analyzers you specify in the schema.xml for that field type. 2)What are the

Re: Unknown field error using JDBC

2008-11-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
which version of DIH are you using? On Tue, Nov 25, 2008 at 5:24 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Hello, I get Unknown field error when I'm indexing an Oracle dB. I've reduced the number of fields/columns in order to troubleshoot. If I change the uniqeKey to timestamp (for example)

Re: Unknown field error using JDBC

2008-11-25 Thread Joel Karlsson
I actually don't know which version I was using, but now I've upgraded to 1.3 and it works like a charm!! Thanks a lot! 2008/11/25 Noble Paul നോബിള്‍ नोब्ळ् [EMAIL PROTECTED] which version of DIH are you using? On Tue, Nov 25, 2008 at 5:24 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Hello,

Re: Schema Design Guidance

2008-11-25 Thread Shalin Shekhar Mangar
Even if you go for the 400,000 documents way, the size of data and number of unique tokens would remain the same. With your data size, you should think about sharding and distributed search. Is the availability of a product a boolean value or the number of items? To make sure that you don't need

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Shalin Shekhar Mangar
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Marcus Stratmann
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12393936/logo_remake.jpg

Re: Sorting and JVM heap size ....

2008-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 25, 2008 at 7:49 AM, souravm [EMAIL PROTECTED] wrote: 3. Another case is - if there are 2 search requests concurrently hitting the server, each with sorting on the same 20 character date field, then also it would need 2x2GB memory. So if I know that I need to support at least 4

Re: Analyzing CSV phrase fields

2008-11-25 Thread Yonik Seeley
The easiest solution would be to create the documents you send to solr with multiple keywords fields... they will be separated by a positionIncrement so a phrase query won't see yankees adjacent to cleveland. If you can't do that, then perhaps patch PatternTokenizer filter to put a larger

Re: CachedSqlEntityProcessor's purpose

2008-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 25, 2008 at 1:52 PM, Amit Nithian [EMAIL PROTECTED] wrote: I like the concept of having multiple entity blocks for clarity but why wouldn't I have (for DB efficiency), the following as one entity's SQL statement select * from X,Y where x.id=y.xid and have two fields pointing at

Re: Using Solr for indexing emails

2008-11-25 Thread Shalin Shekhar Mangar
On Mon, Nov 24, 2008 at 11:51 PM, Timo Sirainen [EMAIL PROTECTED] wrote: DIH seems to be about Solr pulling data into it from an external source. That's not really practical with Dovecot since there's no central repository of any kind of data, so there's no way to know what has changed since

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Shalin Shekhar Mangar
Hi Tom, I don't think anybody has worked on adding this to Solr yet. Do you mind opening a jira issue? On Tue, Nov 25, 2008 at 12:01 AM, Burton-West, Tom [EMAIL PROTECTED]wrote: Hello all, We are having problems with extremely slow phrase queries when the phrase query contains a common

Re: CachedSqlEntityProcessor's purpose

2008-11-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
every row emitted by an outer entity results in a new Sql query in the inner entity. (yes 50 queries on inner entity)So,if you wish to join multiple tables then nested entities is the way to go. CachedSqlEntityProcessor is meant to help you reduce the number of queries fired on sub-entities.

RE: Sorting and JVM heap size ....

2008-11-25 Thread souravm
Hi Shalin, Thanks for the clarifications. Could you please explain a bit more on how the new searcher can double the memory ? Based on your explanation, when a new set of documents gets committed a new searcher is created. So what I understand is whenever a update/delete query and search

matching exact terms

2008-11-25 Thread Brian Whitman
This is probably severe user error, but I am curious about how to index docs to make this query work: happy birthday to return the doc with n_name:Happy Birthday before the doc with n_name:Happy Birthday, Happy Birthday . As it is now, the latter appears first for a query of n_name:happy

Re: Sorting and JVM heap size ....

2008-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 25, 2008 at 9:37 PM, souravm [EMAIL PROTECTED] wrote: Could you please explain a bit more on how the new searcher can double the memory ? Take a look at slide 13 of Yonik's presentation available at http://people.apache.org/~yonik/ApacheConEU2006/Solr.ppt Each searcher in Solr

Re: matching exact terms

2008-11-25 Thread Ryan McKinley
On Nov 25, 2008, at 11:40 AM, Brian Whitman wrote: This is probably severe user error, but I am curious about how to index docs to make this query work: happy birthday to return the doc with n_name:Happy Birthday before the doc with n_name:Happy Birthday, Happy Birthday . As it is now, the

Re: CachedSqlEntityProcessor's purpose

2008-11-25 Thread Amit Nithian
Thanks for the responses. Few follow-ups: 1) It seems that the CachedSQLEntityProcessor performs the where clause in memory on the cache. Is this cache an in memory RDBMS or maps? 2) In the example, there were two use cases, one that is like query=select * from Y where xid=${X.ID} and another

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Chris Harris
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Brendan Grainger
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg

Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Thomas Dowling
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394314/apache_soir_001.jpg https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg

newbie question on SOLR distributed searches with many shards

2008-11-25 Thread Gerald De Conto
I wasn't able to find examples/anything via google so thought I'd ask: Say I want to implement a solution using distributed searches with many shards in SOLR 1.3.0. Also, say there are too many shards to pass in via the URL (dozens, hundreds, whatever) Is there a way to specify in

Keyword extraction

2008-11-25 Thread Plaatje, Patrick
Hi all, Strugling with a question I recently got from a collegue: is it possible to extract keywords from indexed content? In my opinion it should be possible to find out on what words the ranking of the indexed content is the highest (Lucene or Solr), but have no clue where to begin. Anyone

Re: Keyword extraction

2008-11-25 Thread Ryan McKinley
lots of approaches out there... the easiest off the shelf method would be to use the MoreLikeThisHandler and get the top interesting terms; http://wiki.apache.org/solr/MoreLikeThisHandler ryan On Nov 25, 2008, at 2:09 PM, Plaatje, Patrick wrote: Hi all, Strugling with a question I

Re: Using Solr for indexing emails

2008-11-25 Thread Timo Sirainen
On Tue, 2008-11-25 at 20:45 +0530, Shalin Shekhar Mangar wrote: On Mon, Nov 24, 2008 at 11:51 PM, Timo Sirainen [EMAIL PROTECTED] wrote: DIH seems to be about Solr pulling data into it from an external source. That's not really practical with Dovecot since there's no central repository

Spellcheck for phrase queries

2008-11-25 Thread Manepalli, Kalyan
Hi, I am trying to implement a spell check functionality on a particular field. I need to do a complete phrase spell check when user enters multiple words. For eg: If the user enters great Hyat the current implementation would suggest great Hyatt, just correcting the word hyatt.

Stuck threads on Weblogic

2008-11-25 Thread Alexander Ramos Jardim
Hello guys, I am getting some stuck threads on my application when it connects to Solr. The stuck threads occur in an even time, in such a way that each 3 days the app is online it hangs up the entire cluster. I don't know if there's any direct relation to Solr, but I get the following exception

Re: Unknown field error using JDBC

2008-11-25 Thread Jon Baer
This sounds exactly same issue I had when going from 1.3 to 1.4 ... it sounds like DIH is trying to automagically figure out the columns :-\ - Jon On Nov 25, 2008, at 6:37 AM, Joel Karlsson wrote: Hello, I get Unknown field error when I'm indexing an Oracle dB. I've reduced the number of

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Norberto Meijome
On Mon, 24 Nov 2008 13:31:39 -0500 Burton-West, Tom [EMAIL PROTECTED] wrote: The approach to this problem used by Nutch looks promising. Has anyone ported the Nutch CommonGrams filter to Solr? Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Norberto Meijome
On Wed, 26 Nov 2008 10:08:03 +1100 Norberto Meijome [EMAIL PROTECTED] wrote: We didn't notice any severe performance hit but : - data set isn't huge ( ca 1 MM docs). - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to lower the number of hits to SOLR. To make

Increased garbage with Solr 1.3?

2008-11-25 Thread Walter Underwood
We are moving from Solr 1.1 to 1.3, and have noticed that 1.3 is working the garbage collector a lot more. Has anyone else seen this? wunder

Re: Increased garbage with Solr 1.3?

2008-11-25 Thread Yonik Seeley
On Tue, Nov 25, 2008 at 7:56 PM, Walter Underwood [EMAIL PROTECTED] wrote: We are moving from Solr 1.1 to 1.3, and have noticed that 1.3 is working the garbage collector a lot more. Has anyone else seen this? During indexing or searching? Indexing uses the SolrDocument class as an intermediate

Re: Increased garbage with Solr 1.3?

2008-11-25 Thread Walter Underwood
Searching. No facets, but fuzzy matching. --wunder On 11/25/08 5:08 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Tue, Nov 25, 2008 at 7:56 PM, Walter Underwood [EMAIL PROTECTED] wrote: We are moving from Solr 1.1 to 1.3, and have noticed that 1.3 is working the garbage collector a lot more.

copyField stored values question

2008-11-25 Thread Michael Henson
Hello, I am using copyField to send the raw name of an entity into different fields for indexing: # schema.xml snippet field name=raw_name type=string indexed=false stored=true / field name=indexed_name type=some_custom_type indexed=true stored=true / field name=other_indexed_name

Re: copyField stored values question

2008-11-25 Thread Yonik Seeley
On Tue, Nov 25, 2008 at 9:24 PM, Michael Henson [EMAIL PROTECTED] wrote: I set the indexed fields to be stored so that I could see what exactly my custom types' filters produce. In the Analyzer utility in the Admin webapp seems to apply the filters properly. However, query results against this

Re: CachedSqlEntityProcessor's purpose

2008-11-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Nov 25, 2008 at 11:35 PM, Amit Nithian [EMAIL PROTECTED] wrote: Thanks for the responses. Few follow-ups: 1) It seems that the CachedSQLEntityProcessor performs the where clause in memory on the cache. Is this cache an in memory RDBMS or maps? It is a hashmap in memory 2) In the

Re: newbie question on SOLR distributed searches with many shards

2008-11-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
anything that is passed as a request parameter can be put into the SearchHandlers defaults or invariants section . This is equivalent to passing the shard url in the request However this expects that you may need to setup a loadbalancer if a shard hhos more than one host On Wed, Nov 26, 2008

Facet Query and Query

2008-11-25 Thread Jae Joo
I am having some trouble to utilize the facet Query. As I know that the facet Query has better performance that simple query (q). Here is the example.