Re: ICUTokenizer class not found with Solr 4.4

2013-08-28 Thread Tom Burton-West
it to yours or just add this information (i.e. other scenarios where class loading not working) to your JIRA? Details below: Tom The documentation in the collections1/conf directory is confusing. For example the collections1/conf/solrconfig.xml file says you should put a ./lib dir in your

ICUTokenizer class not found with Solr 4.4

2013-08-27 Thread Tom Burton-West
making some kind of a configuration error. I also don't understand the workaround in SOLR-4852. Is this an ICU issue? A java 7 issue? a Solr 4.4 issue, or did I simply not understand the README.txt? Tom -- org.apache.solr.common.SolrException

How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
=solr.SchemaSimilarityFactory discountOverlaps=true / Tom

Re: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
. Is the default for Solr 4 true? similarity class=solr.BM25SimilarityFactory float name=k11.2/float float name=b0.75/float bool name=discountOverlapsfalse/bool /similarity On Thu, Aug 22, 2013 at 4:58 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Tom, Don't

Re: How to set discountOverlaps=true in Solr 4x schema.xml

2013-08-22 Thread Tom Burton-West
I should have said that I have set it both to true and to false and restarted Solr each time and the rankings and info in the debug query showed no change. Does this have to be set at index time? Tom

Solr 4.2.1 limit on number of rows or number of hits per shard?

2013-07-25 Thread Tom Burton-West
requested to 100,000, I have no problems. Does Solr have a limit on number of rows that can be requested or is this a bug? Tom INFO: [core] webapp=/dev-1 path=/select params={shards=XXX:8111/dev-1/core,XXX:8111/dev-2/core,XXX:8111/dev-3/corefl=vol_idindent=onstart=0q=*:*rows=100} hits

Re: Solr 4.2.1 limit on number of rows or number of hits per shard?

2013-07-25 Thread Tom Burton-West
, which would result in about 3 billion pages. So testing the scalability of queries used by our current production system, such as the query against the index that is not released to production to get a list of the unique ids that are actually indexed in Solr is part of that testing process. Tom

Re: Solr 4.2.1 limit on number of rows or number of hits per shard?

2013-07-25 Thread Tom Burton-West
are sending to the head shard and actually get a good measure of how many bytes are being sent around. I'll poke around and look at multipartUploadLimitInKB, and also see if there is some servlet container limit config I might need to mess with. Tom On Thu, Jul 25, 2013 at 2:46 PM, Shawn

Re: Solr 4.2.1 limit on number of rows or number of hits per shard?

2013-07-25 Thread Tom Burton-West
=52952 Tom INFO: [core] webapp=/dev-1 path=/select params={fl=vol_idindent=onstart=700q=*:*rows=100} hits=119220943 status=0 QTime=9772 Jul 25, 2013 5:39:43 PM org.apache.solr.core.SolrCore execute INFO: [core] webapp=/dev-1 path=/select params={fl=vol_idindent=onstart=800q

Re: What does too many merges...stalling in indexwriter log mean?

2013-07-12 Thread Tom Burton-West
. Tom On Thu, Jul 11, 2013 at 5:29 PM, Shawn Heisey s...@elyograg.org wrote: On 7/11/2013 1:47 PM, Tom Burton-West wrote: We are seeing the message too many merges...stalling in our indexwriter log. Is this something to be concerned about? Does it mean we need to tune something in our

What does too many merges...stalling in indexwriter log mean?

2013-07-11 Thread Tom Burton-West
Hello, We are seeing the message too many merges...stalling in our indexwriter log. Is this something to be concerned about? Does it mean we need to tune something in our indexing configuration? Tom

When not to use NRTCachingDirectory and what to use instead.

2013-07-10 Thread Tom Burton-West
? Does the NRTCachingDirectory have any benefit for indexing under the use case noted above? I'm guessing we should just use the solrStandardDirectoryFactory instead. Is this correct? Tom --- !-- The DirectoryFactory to use for indexes

Re: Sorting results by last update date

2013-05-30 Thread Tom Gullo
sort=last_updated_date desc Maybe adding %20 will help: sort=last_updated_date%20desc -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-results-by-last-update-date-tp4066692p4066986.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.x replacement for termsIndexDivisor

2013-05-21 Thread Tom Burton-West
%28int%29 This is followed by an example of how to set the min and max block size in Lucene. Is the ability to set the min and max block size available in Solr? If not, should I open a JIRA? Tom -- Exceprt from the Solr 4.3 latest rev of the example/solrconfig.xml file: http

Re: solr 4.2.1 still has problems with index version and index generation

2013-04-08 Thread Tom Gullo
I'm on 4.1 and I have a similar problem. Except for the version number everything else seems to be fine. Is that what other people are seeing? -- View this message in context:

Re: Slow queries for common terms

2013-03-22 Thread Tom Burton-West
warming with your most common terms. On the other hand as Jan pointed out, you may be cpu bound because Solr doesn't have early termination and has to rank all 90 million docs in order to show the top 10 or 25. Did you try the OR search to see if your CPU is at 100%? Tom On Fri, Mar 22, 2013 at 10

strange edismax parsing when searching in multiple fields (#TB)

2013-03-13 Thread Burgmans, Tom
and if yes: why? Thanks for any hint, Tom This email and any attachments may contain confidential or privileged information and is intended for the addressee only. If you are not the intended recipient, please immediately notify us by email or telephone and delete the original email and attachments

RE: [SPAM] Re: strange edismax parsing when searching in multiple fields (#TB)

2013-03-13 Thread Burgmans, Tom
same set of stop words for all fields that you search on. You might find this useful : http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ --- On Wed, 3/13/13, Burgmans, Tom tom.burgm...@wolterskluwer.com wrote: From: Burgmans, Tom tom.burgm...@wolterskluwer.com Subject

Search in String and Text_en fields simultaneously with edismax

2013-02-28 Thread Burgmans, Tom
for body but search as a phrase for valueadd? Thanks, Tom Burgmans This email and any attachments may contain confidential or privileged information and is intended for the addressee only. If you are not the intended recipient, please immediately notify us by email or telephone and delete

RE: Search in String and Text_en fields simultaneously with edismax

2013-02-28 Thread Burgmans, Tom
type's analyzer does with its value is irrelevant to query parsing. -- Jack Krupansky -Original Message- From: Burgmans, Tom Sent: Thursday, February 28, 2013 10:48 AM To: solr-user@lucene.apache.org Subject: Search in String and Text_en fields simultaneously with edismax I have a field

ngrams or truncation for multilingual searching in Solr

2013-02-05 Thread Tom Burton-West
York, NY, USA, 75-82. DOI=10.1145/1571941.1571957 http://doi.acm.org/10.1145/1571941.1571957 Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search

Why does debugQuery/explain output sometimes include queryNorm and sometimes not for same query?

2013-01-25 Thread Tom Burton-West
result (and show up in each explain from the debugQuery?) This is Solr 3.6. Tom - str name=parsedqueryocr:aardvark/str lst name=explain str name=mdp.390150591683130.4395488 = (MATCH) fieldWeight(ocr:aardvark in 504374), product

Re: Why does debugQuery/explain output sometimes include queryNorm and sometimes not for same query?

2013-01-25 Thread Tom Burton-West
Thanks Hoss, Yes it is a distributed query. Tom On Fri, Jan 25, 2013 at 2:32 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have a one term query: ocr:aardvark When I look at the explain : output, for some matches the queryNorm and fieldWeight are shown and for : some matches

coord missing from debugQuery explain?

2013-01-08 Thread Tom Burton-West
Hello, I'm trying to understand some Solr relevance issues using debugQuery=on, but I don't see the coord factor listed anywhere in the explain output. My understanding is that the coord factor is not included in either the querynorm or the fieldnorm. What am I missing? Tom

Best practices for Solr highlighter for CJK

2013-01-02 Thread Tom Burton-West
. i.e. ABC = searched as AB BC only AB gets highlighted even if the matching string is ABC. (Where ABC are chinese characters such as 大亚湾 = searched as 大亚 亚湾, but only 大亚 is highlighted rather than 大亚湾) Is there some highlighting parameter that might fix this? Tom Burton-West

How to I let the FVH highlight individual terms instead of the complete phrase?

2012-12-21 Thread Burgmans, Tom
/boundaryScanner Thanks, Tom -- Tom Burgmans [cid:image001.jpg@01CDDFA4.2B7968E0] Search Specialist Tel: +31 (0)17 246 66 33 Mobile: +31 (0)6 306 821 78 Platform Technologies Global Platform Organization Zuidpoolsingel 2 2408 ZE, Alphen aan den Rijn

Solr/Lucene Engineer - Contract Opportunity - Raleigh, NC

2012-12-20 Thread Polak, Tom
: Description: C:\Users\dhil2\AppData\Roaming\Microsoft\Signatures\ExperisIT.jpg] Tom Polak IT Recruiter Experis IT Staffing 1122 Oberlin Road Raleigh, NC 27605 T: 919 755 5838 F: 919 755 5828 C: 919 457 8530 tom.po...@experis.commailto:tom.po...@experis.com www.experis.comhttp

ICUTokenizer labels number as Han character?

2012-12-19 Thread Tom Burton-West
. This doesn't seem right. Couldn't fit the whole analysis output on one screen so there are two screenshots attached. Any clues as to what is going on and whether it is a problem? Tom

configuring per-field similarity in Solr 4: the global similarity does not support it

2012-12-17 Thread Tom Burton-West
here. Can someone point me to documentation or examples? Tom Simplified schema.xml excerpt: fieldType name=CJKFullText class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.ICUTokenizerFactory

How to configure termvectors to not store positions/offsets

2012-12-13 Thread Tom Burton-West
frequencies 2) Shows how to configure termvectors in Solr schema.xml to only store term frequencies, and not positions and offsets? Tom

edismax: implicit AND changes into implicit OR

2012-12-12 Thread Burgmans, Tom
OR, in case an Explicit OR is added to the query expression. The parsedquery information confirms this behavior. Why is edismax doing this? Tested on a Solr 4.0.0 instance. Thanks, Tom -- Tom Burgmans [cid:image001.jpg@01CDD86E.DC411F70] Search Specialist Tel: +31 (0)17 246 66 33 Mobile

RE: edismax: implicit AND changes into implicit OR

2012-12-12 Thread Burgmans, Tom
to prefix all my search terms with a +. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday 12 December 2012 05:46 To: solr-user@lucene.apache.org Subject: Re: edismax: implicit AND changes into implicit OR On 12/12/2012 5:51 AM, Burgmans, Tom wrote: I have some

RE: edismax: implicit AND changes into implicit OR

2012-12-12 Thread Burgmans, Tom
AND changes into implicit OR On 12/12/2012 10:27 AM, Burgmans, Tom wrote: I have set solrQueryParser defaultOperator=AND/ in the schema (and restarted Solr), and tested again with http://localhost:8983/solr/collection1/browse?defType=edismaxq=(Thomas+Michael)+OR+xxxmatchesnothingxxxq.op=AND note

RE: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Burgmans, Tom
In our case it's the opposite. For our clients it is very important that every synonym gets equal chances in the relevancy calculation. The fact that nol scores higher than net operating loss, simply because its document frequency is lower, is unacceptable and a reason to look for ways to

RE: score calculation

2012-12-12 Thread Burgmans, Tom
at the EXPLAIN information to see how the final score is calculated. Tom -Original Message- From: Sangeetha [mailto:sangeetha...@gmail.com] Sent: Thursday 13 December 2012 08:33 To: solr-user@lucene.apache.org Subject: score calculation I want to know how score is calculated? what

Re: Restricting search results by field value

2012-12-06 Thread Tom Mortimer
Sounds like it's worth a try! Thanks Andre. Tom On 5 Dec 2012, at 17:49, Andre Bois-Crettez andre.b...@kelkoo.com wrote: If you do grouping on source_id, it should be enough to request 3 times more documents than you need, then reorder and drop the bottom. Is a 3x overhead acceptable

Re: Restricting search results by field value

2012-12-06 Thread Tom Mortimer
Thanks, but even with group.main=true the results are not in relevancy (score) order, they are in group order. Which is why I can't use it as is. Tom On 6 Dec 2012, at 19:00, Way Cool way1.wayc...@gmail.com wrote: Grouping should work: group=truegroup.field=source_idgroup.limit=3group.main

Restricting search results by field value

2012-12-05 Thread Tom Mortimer
docs that way, but the potential overhead is large. Is there any way of doing this in Solr without hacking in a custom Lucene Collector? (which doesn't look all that straightforward). cheers, Tom

Re: BM25 model for solr 4?

2012-11-15 Thread Tom Burton-West
) production implementations that have tested the new ranking models available in Solr. Tom On Wed, Nov 14, 2012 at 9:16 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, Does anybody can kindly tell me how to setup solr to use BM25? By the way, are there any experiment or research shows BM25

URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Tom Burton-West
work either: mysolr.umich.edu/analysis/field?name=titleq=fire-fly No matter what field I specify, the analysis returned is for the default field. (See repsonse excerpt below) Is there a page somewhere that shows the correct syntax for sending get requests to the FieldAnalysisRequestHandler? Tom

Re: URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Tom Burton-West
Thanks Robert, Somehow I read the doc but still entered the params wrong. Should have been analysis.fieldname instead of analysis.name Works fine now. Tom On Tue, Nov 13, 2012 at 2:11 PM, Robert Muir rcm...@gmail.com wrote: I think the UI uses this behind the scenes, as in no more

Re: Skewed IDF in multi lingual index

2012-11-08 Thread Tom Burton-West
Hi Markus, No answers, but I am very interested in what you find out. We currently index all languages in one index, which presents different IDF issues, but are interested in exploring alternatives such as the one you describe. Tom Burton-West http://www.hathitrust.org/blogs/large-scale

Solr 4.0 error message: Unsupported ContentType: Content-type:text/xml

2012-11-02 Thread Tom Burton-West
, application/csv, application/javabin, text/xml, application/json] We use exactly the same code without problem with Solr 3.6. We are sending a ContentType 'text/xml'. Is it likely that there is some other problem and this is just not quite the right error message? Tom

Re: Solr 4.0 error message: Unsupported ContentType: Content-type:text/xml

2012-11-02 Thread Tom Burton-West
Thanks Jack, That is exactly the problem. Apparently earlier versions of Solr ignored the extra text, which is why we didn't catch the bug in our code earlier. Thanks for the quick response. Tom On Fri, Nov 2, 2012 at 5:34 PM, Jack Krupansky j...@basetechnology.comwrote: That message makes

Re: AutoIndexing

2012-09-25 Thread Tom Mortimer
Hi Darshan, Can you give us some more details, e.g. what do you mean by database? A RDBMS? Which software? How are you indexing it (or intending to index it) to Solr? etc... cheers, Tom On 25 Sep 2012, at 09:55, darshan dk...@dreamsoftech.com wrote: Hi All, Is there any

Re: How can I create about 100000 independent indexes in Solr?

2012-09-25 Thread Tom Mortimer
Hi, Why do you think that the indexes should be independent? What would be the problem with using a single index and filter queries? Tom On 25 Sep 2012, at 03:21, 韦震宇 weizhe...@win-trust.com wrote: Dear all, The company I'm working in have a website to server more than 10 customers

Re: AutoIndexing

2012-09-25 Thread Tom Mortimer
/DataImportHandlerDeltaQueryViaFullImport Tom On 25 Sep 2012, at 11:28, darshan dk...@dreamsoftech.com wrote: My Document is Database(yes RDBMS) and software for it is postgresql, where any change in it's table should be reflected, without re-indexing. I am indexing it via DIH process Thanks, Darshan -Original Message

Re: ID reference field - Needed but not searchable or retrievable

2012-09-20 Thread Tom Mortimer
remove the whole doc, not just the uniqueID field. Tom On 20 Sep 2012, at 13:38, Spadez james_will...@hotmail.com wrote: Hi. My SQL database assigns a uniqueID to each item. I want to keep this uniqueID assosiated to the items that are in Solr even though I wont ever need to display

Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer
-Xmx500M. I must be doing something stupid - surely this result is unexpected? Does anybody have any thoughts where it might be going wrong? cheers, Tom

Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer
Before anyone asks, these results were obtained warm. On 20 Sep 2012, at 14:39, Tom Mortimer tom.m.f...@gmail.com wrote: Hi all, After reading http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , I thought I'd do my own experiments. I used 2M docs from wikipedia

Re: Personalized Boosting

2012-09-19 Thread Tom Mortimer
/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29 Tom On 19 Sep 2012, at 02:49, deniz denizdurmu...@gmail.com wrote: Hello Tom Thank you for your link, but after overviewing it, I dont think it will help... In my case

Re: Solr4 how to make it do this?

2012-09-18 Thread Tom Mortimer
Surrey - q=Surrey fq=bed:3 I guess this kind of thing could also be implemented as a Solr query plug-in. Don't know if anything like it exists. Tom On 18 Sep 2012, at 11:30, george123 daniel.tarase...@gmail.com wrote: I guess I could come up with a synonyms.txt file and every instance of 3

Re: Personalized Boosting

2012-09-18 Thread Tom Mortimer
Hi, Would this do the job? http://wiki.apache.org/solr/QueryElevationComponent Tom On 18 Sep 2012, at 01:36, deniz denizdurmu...@gmail.com wrote: Hello All, I have a requirement or a pre=requirement for our search application. Basically the engine will be on a website with plenty

Solr 4.0 Beta: Admin UI does not correctly implement dismax/edismax query

2012-09-13 Thread Tom Burton-West
name=parsedquerytext:fire text:fly/str If a correct dismax query was being sent to Solr the parsedquery would have something like the following: str name=parsedquery(+DisjunctionMaxQuery(((text:fire text:fly))) Tom Burton-West

Re: Solr 4.0 Beta: Admin UI does not correctly implement dismax/edismax query

2012-09-13 Thread Tom Burton-West
Thanks Erik, Just found out that there is already a bug report for this open as https://issues.apache.org/jira/browse/SOLR-3811. Tom On Thu, Sep 13, 2012 at 12:52 PM, Erik Hatcher erik.hatc...@gmail.comwrote: That's definitely a bug. dismax=true is not the correct parameter to send. Should

Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Tom Burton-West
in the example file does not exlain what the termIndexDivisor does. Would it be appropriate to add these back to the wiki page? If not, could someone add a line or two to the comments in the Solr 4.0 example file explaining what the termIndexDivisor doe? Tom

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Tom Burton-West
make sense for the default codec, then maybe they need to be commented out or removed from the solr example solrconfig.xml. Tom On Fri, Sep 7, 2012 at 1:33 PM, Robert Muir rcm...@gmail.com wrote: Hi Tom: I already enhanced the javadocs about this for Lucene, putting warnings everywhere in bold

Re: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor

2012-09-07 Thread Tom Burton-West
or solrconfig.xml? Is there some simple way to specify minBlockSize and maxBlockSize in schema.xml? Once I get this all working and understand it, I'll be happy to draft some documentation. I'm really looking forward to experimenting with 4.0! Tom Tom On Fri, Sep 7, 2012 at 2:58 PM, Robert Muir rcm

Solr 4.0 beta : Is collection1 hard coded somewhere?

2012-08-23 Thread Tom Burton-West
org.apache.solr.core.CoreContainer create INFO: Creating SolrCore 'collection1' using instanceDir: /l/solrs/dev/solrs/4.0/3/collection1 Aug 23, 2012 12:06:02 PM org.apache.solr.core.SolrResourceLoader init I think somehow the previous solr.xml configuration is being stored on disk somewhere and loaded. Any clues? Tom

Re: Solr 4.0 beta : Is collection1 hard coded somewhere?

2012-08-23 Thread Tom Burton-West
wiki page and the release notes should point this out. Tom

Re: Solr 4.0 beta : Is collection1 hard coded somewhere?

2012-08-23 Thread Tom Burton-West
-3753 On Thu, Aug 23, 2012 at 1:04 PM, Tom Burton-West tburt...@umich.edu wrote: I did not describe the problems correctly. I have 3 solr shards with solr homes .../solrs/4.0/1 .../solrs/4.0/2 and .../solrs/4.0/2solrs/3 For shard 1 I have a solr.xml file with the modifications described

Re: Solr 4.0 Beta missing example/conf files?

2012-08-23 Thread Tom Burton-West
, it was very wierd to still get a message about a missing collection1 core directory. See this JIRA issue:https://issues.apache.org/jira/browse/SOLR-3753 Tom On Thu, Aug 23, 2012 at 7:56 PM, Erik Hatcher erik.hatc...@gmail.comwrote: Tom - I corrected, on both trunk and 4_x, a reference to solr

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
Hi Lance, I don't understand enough of how the field collapsing is implemented, but I thought it worked with distributed search. Are you saying it only works if everything that needs collapsing is on the same shard? Tom On Wed, Aug 22, 2012 at 2:41 AM, Lance Norskog goks...@gmail.com wrote

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
Hi Tirthankar, Can you give me a quick summary of what won't work and why? I couldn't figure it out from looking at your thread. You seem to have a different issue, but maybe I'm missing something here. Tom On Tue, Aug 21, 2012 at 7:10 PM, Tirthankar Chatterjee tchatter...@commvault.com

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
numFound = 325. This shows that the items in the group are distributed between different shards. What am I missing here? What is it that you are saying does not work? Tom Field Collapse query ( IP address changed, and newlines added and shard urls simplified for readability) http://solr

Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Tom Burton-West
Hello, Usually in the example/solr file in Solr distributions there is a populated conf file. However in the distribution I downloaded of solr 4.0.0-BETA, there is no /conf directory. Has this been moved somewhere? Tom ls -l apache-solr-4.0.0-BETA/example/solr total 107 drwxr-sr-x 2 tburtonw

Re: Solr 4.0 Beta missing example/conf files?

2012-08-22 Thread Tom Burton-West
Thanks Markus! Should the README.txt file in solr/example be updated to reflect this? Is that something I need to enter a JIRA issue for? Tom On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi - The example has been moved to collection1/ -Original

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-22 Thread Tom Burton-West
and take a look at the memory use using JConsole. Tom On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Hi Tom, We had an issue where we are keeping millions of docs in a single node and we were trying to group them on a string field which

Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-21 Thread Tom Burton-West
users the choice of a list of the most relevant pages, or a list of the books containing the most relevant pages. We have approximately 3 billion pages. Does anyone have experience using field collapsing on this sort of scale? Tom Tom Burton-West Information Retrieval Programmer Digital Library

Re: edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc)

2012-07-02 Thread Tom Burton-West
Opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3589, which also lists a couple other related mailing list posts. On Thu, Jun 28, 2012 at 12:18 PM, Tom Burton-West tburt...@umich.eduwrote: Hello, My previous e-mail with a CJK example has received no replies. I verified

edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc)

2012-06-28 Thread Tom Burton-West
, but want to find out if I am missing something here. Details of several queries are appended below. Tom Burton-West edismax query mm=2 query with hypenated word [fire-fly] lst name=debug str name=rawquerystring{!edismax mm=2}fire-fly/str str name=querystring{!edismax mm=2}fire-fly/str str name

edismax parser ignores mm parameter when tokenizer splits tokens (i.e. CJK)

2012-06-26 Thread Tom Burton-West
] turns into a Boolean OR query for ( [two] OR [thirds] ). Is there some way to tell the edismax query parser to stick with mm =100%? Appended below is the debugQuery output for these two queries and an exceprt from our schema.xml. Tom Tom Burton-West http://www.hathitrust.org/blogs/large-scale

Fwd: suggester/autocomplete locks file preventing replication

2012-06-22 Thread tom
, Simon Willnauer simon.willna...@googlemail.com mailto:simon.willna...@googlemail.com wrote: On Fri, Jun 22, 2012 at 10:37 AM, tom dev.tom.men...@gmx.net mailto:dev.tom.men...@gmx.net wrote: cross posting this issue to the dev list in the hope to get

Re: solrj and replication

2012-06-21 Thread tom
ok tested it myself and a slave runnning embedded works, just not within my application -- yet... On 20.06.2012 18:14, tom wrote: hi, i was just wondering if i need to do smth special if i want to have an embedded slave to get replication working ? my setup is like so: - in my clustered

suggester/autocomplete locks file preventing replication

2012-06-21 Thread tom
hi, i'm using the suggester with a file like so: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str

Re: suggester/autocomplete locks file preventing replication

2012-06-21 Thread tom
BTW: a core unload doesnt release the lock either ;( On 21.06.2012 14:39, tom wrote: hi, i'm using the suggester with a file like so: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name

Re: suggester/autocomplete locks file preventing replication

2012-06-21 Thread tom
reason for this or rather a bug? should i move the topic to the dev list? On 21.06.2012 14:49, tom wrote: BTW: a core unload doesnt release the lock either ;( On 21.06.2012 14:39, tom wrote: hi, i'm using the suggester with a file like so: searchComponent class=solr.SpellCheckComponent

solrj and replication

2012-06-20 Thread tom
runs in a jetty. - the embedded codes dont expose any of the solr servlets note: that the slave config, if started in jetty, does proper replication, while when embedded it doesnt. using solr 3.5 thx tom

What is the docs number in Solr explain query results for fieldnorm?

2012-05-25 Thread Tom Burton-West
, maxDocs=17707) 0.625 = fieldNorm(field=ocr, doc=16624) /str Tom Burton-West - str name=78562575E066497D-518 0.42061833 = (MATCH) fieldWeight(ocr:the in 8396), product of: 7.071068 = tf(termFreq(ocr:the)=50) 1.087715 = idf(docFreq=16219, maxDocs=17707) 0.0546875 = fieldNorm(field

boost not showing up in Solr 3.6 debugQueries?

2012-05-17 Thread Tom Burton-West
and this is one of the querie from our log. Tom Burton-West lst name=debug str name=rawquerystring 兵にな^1000 OR hanUnigrams:兵にな/str str name=querystring 兵にな^1000 OR hanUnigrams:兵にな/str str name=parsedquery((+ocr:兵に +ocr:にな)^1000.0) hanUnigrams:兵/str str name=parsedquery_toString((+ocr:兵に +ocr:にな

RE: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-30 Thread Burton-West, Tom
understanding of Japanese, I can see how perhaps bigramming a Han and Hiragana character might make sense but what about Han and Katakana? Lance, how did you weight the unigram vs bigram fields for CJK? or did you just OR them together assuming that idf will give the bigrams more weight? Tom

RE: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-30 Thread Burton-West, Tom
Thanks wunder, I really appreciate the help. Tom

CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-04-27 Thread Burton-West, Tom
characters are formed: いろは革命歌 =“いろ” ”ろは“ “は革” ”革命” “命歌” Is there a way to specify that you don’t want bigrams across character types? Tom Tom Burton-West Digital Library Production Service University of Michigan Library http://www.hathitrust.org/blogs/large-scale-search

maxMergeDocs in Solr 3.6

2012-04-19 Thread Burton-West, Tom
solrconfig was 2,147,483,647 we would never hit this limit, but I was wondering about why it is no longer in the example. Tom

Re: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-27 Thread tom
so any one has a clue what's (might be) going wrong ? or do i have to debug and myself and post a jira issue? PS: unfortunately i cant give anyone the index for testing due to NDA. cheers On 22.03.2012 10:17, tom wrote: same On 22.03.2012 10:00, Markus Jelsma wrote: Can you try

possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-22 Thread tom
hi folks, i think i found a bug in the spellchecker but am not quite sure: this is the query i send to solr: http://lh:8983/solr/CompleteIndex/select? rows=0 echoParams=all spellcheck=true spellcheck.onlyMorePopular=true spellcheck.extendedResults=no q=a+bb+ccc++ and this is the result:

Re: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-22 Thread tom
same On 22.03.2012 10:00, Markus Jelsma wrote: Can you try spellcheck.q ? On Thu, 22 Mar 2012 09:57:19 +0100, tom dev.tom.men...@gmx.net wrote: hi folks, i think i found a bug in the spellchecker but am not quite sure: this is the query i send to solr: http://lh:8983/solr/CompleteIndex

RE: autoGeneratePhraseQueries sort of silently set to false

2012-02-23 Thread Burton-West, Tom
Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5. Not sure how to do that. Tom -Original Message- From: Naomi Dushay [mailto:ndus...@stanford.edu] Sent: Thursday, February 23, 2012 1:57 PM To: solr-user@lucene.apache.org Subject

RE: autoGeneratePhraseQueries sort of silently set to false

2012-02-23 Thread Burton-West, Tom
was also noted in changes.txt. Is it possible to revise the changes.txt for 3.5? Do you by any chance know where the change in the default behavior was discussed? I know it has been a contentious issue. Tom -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent

RE: Can Apache Solr Handle TeraByte Large Data

2012-01-16 Thread Burton-West, Tom
on-the-fly. That way they can search within the document and get page level results. More details about our setup: http://www.hathitrust.org/blogs/large-scale-search Tom Burton-West University of Michigan Library www.hathitrust.org -Original Message-

Re: IllegalStateException, response already committed - replication related

2011-12-08 Thread Tom Lianza
so much as a side effect of a larger problem (like why the operation is taking so long). -- Tom Lianza CTO, Wishpot.com skype: tlianza

Re: Huge Performance: Solr distributed search

2011-12-02 Thread Tom Gullo
Interesting info. You should look into using Solid State Drives. I moved my search engine to SSD and saw dramatic improvements. -- View this message in context: http://lucene.472066.n3.nabble.com/Huge-Performance-Solr-distributed-search-tp3530627p346.html Sent from the Solr - User

RE: Can dynamic fields defined by a prefix be used with LatLonType?

2011-10-27 Thread Tom Cooke
. -Original Message- From: Tom Cooke [mailto:tom.co...@gossinteractive.com] Sent: 26 October 2011 20:06 To: solr-user@lucene.apache.org Subject: Can dynamic fields defined by a prefix be used with LatLonType? Hi, I'm adding support for lat/lon data into an existing schema which uses prefix

Can dynamic fields defined by a prefix be used with LatLonType?

2011-10-26 Thread Tom Cooke
, Tom Sign-up to our newsletter for industry best practice and thought leadership: http://www.gossinteractive.com/newsletter Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, Plymouth, PL1 1LG. Company Registration No: 3553908 This email contains proprietary

Re: millions of records problem

2011-10-18 Thread Tom Gullo
Getting a solid-state drive might help -- View this message in context: http://lucene.472066.n3.nabble.com/millions-of-records-problem-tp3427796p3431309.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any attribute in schema.xml to avoid duplication in solr?

2011-10-04 Thread Tom Gullo
UniqueId avoids entries with the same id. -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-attribute-in-schema-xml-to-avoid-duplication-in-solr-tp3392408p3393085.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Getting facet counts for 10,000 most relevant hits

2011-10-03 Thread Burton-West, Tom
and do some performance tests on my kludge. That might work for us as an interim measure until I have time to dive into the Solr/Lucene distributed faceting code. Tom -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Friday, September 30, 2011 9:20 PM

RE: Getting facet counts for 10,000 most relevant hits

2011-09-30 Thread Burton-West, Tom
result set. In my use case the top 10,000 hits versus all 170,000. Tom -Original Message- From: Lan [mailto:dung@gmail.com] Sent: Thursday, September 29, 2011 7:40 PM To: solr-user@lucene.apache.org Subject: Re: Getting facet counts for 10,000 most relevant hits I implemented

Getting facet counts for 10,000 most relevant hits

2011-09-23 Thread Burton-West, Tom
facet values with the highest counts for those relevant documents. Is this possible or would it require writing some lucene or Solr code? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search

RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-19 Thread Burton-West, Tom
in the example configuration. Took a quick look at the code, but I'm obviously looking in the wrong place. Is mergeFactor=10 interpreted by TieredMergePolicy as segmentsPerTier=10 and maxMergeAtOnce=10? If I specify values for these is the mergeFactor setting ignored? Tom -Original

<    1   2   3   4   5   >