Re: Detecting query errors with SolrJ

2012-01-06 Thread Michael Sokolov
See SOLR-141; there are a few patches - currently all you get back is a 400 error with no actual information equivalent to what is logged in the solr exception. On 1/6/2012 12:46 PM, Shawn Heisey wrote: On 1/5/2012 7:25 AM, Erick Erickson wrote: Somewhere you have access to a

Re: Detecting query errors with SolrJ

2012-01-08 Thread Michael Sokolov
. It's relatively straightforward to fix, and will make a substantial improvement for solrj users. If you're still looking for tickets to push forward :) -Mike On 1/8/2012 4:41 PM, Shawn Heisey wrote: On 1/6/2012 3:57 PM, Michael Sokolov wrote: See SOLR-141; there are a few patches - currently

Re: Deciding whether to stem at query time

2012-04-23 Thread Michael Sokolov
Yes, and you might choose to use different options for different fields. For dictionary searches, where users are searching for specific words, and a high degree of precision is called for, stemming is less helpful, but for full text searches, more so. -Mike On 4/23/2012 3:35 PM, Walter

Re: SolrCloud as my primary data store

2013-02-28 Thread Michael Sokolov
On 02/21/2013 12:02 AM, jimtronic wrote: Now that I've been running Solr Cloud for a couple months and gotten comfortable with it, I think it's time to revisit this subject. ... I'd really like to hear from someone who has made the leap. Cheers, Jim We use Solr as our primary

Re: Preserve XML hierarchy

2011-07-26 Thread Michael Sokolov
Here's an idea: if you index the full text of your XML document using XmlCharFilter - available as a patch (or HtmlCharFilter), and then highlight the entire document (you will need to fiddle with highlighter parameters a bit to make sure you get 1 fragment that covers the entire file) with

Re: SolrJ and class versions

2011-07-26 Thread Michael Sokolov
It's not clear to me (from the wiki, or the jira issue) whether the compatibility break goes both ways - maybe I should just try and see, but just to get this out there on the list: is the 3.X javabin client able to talk to 1.4 servers? If so, then there is a nicely decoupled upgrade path:

Re: slow highlighting because of stemming

2011-07-30 Thread Michael Sokolov
On 7/30/2011 3:46 AM, Orosz György wrote: Hi, Thanks for the answer! I am doing some logging about stemming, and what I can see is that a lot of tokens are stemmed for the highlighting. It is the strange part, since I don't understand why does any highlighter need stemming again. Consider that

Re: Solr request filter and indexing process

2011-07-31 Thread Michael Sokolov
The first thing that comes to mind is to check whether you are committing after every insert. A number of things may happen when you commit, including merges, rebuilding the spelling dictionary (is this still true in 3.3? maybe not). It's better to commit after a batch of inserts. -Mike

Re: Solr can not index F**K!

2011-07-31 Thread Michael Sokolov
On 7/31/2011 7:29 PM, randohi wrote: org.apache.solr.analysis.KeywordMarkerFilterFactory args:{protected: protwords.txt luceneMatchVersion: LUCENE_33 } Could something be going on here? What's in your protwords.txt ? -Mike

Re: Store complete XML record (DIH XPathEntityProcessor)

2011-08-01 Thread Michael Sokolov
On 8/1/2011 6:17 AM, Chantal Ackermann wrote: If you are looking for a config-only solution - i'm not sure that there is one. Someone else might be able to comment on that? You might want to take a look at SOLR-2597; it has a patch for XmlStripCharFilter, which will strip tags from XML for

Re: Some questions about SolrJ

2011-08-13 Thread Michael Sokolov
On 8/12/2011 4:18 PM, Shawn Heisey wrote: On 8/12/2011 1:49 PM, Shawn Heisey wrote: I am sure that I have more questions, but I may be able to answer a lot of them myself if I can see better examples. Thought of another question. My Perl build system uses DIH for all indexing, but with the

Re: Some questions about SolrJ

2011-08-13 Thread Michael Sokolov
Shawn, my experience with SolrJ in that configuration (no autoCommit) is that you have control over commits: if you don't issue an explicit commit, it won't happen. Re lifecycle: we don't use a static instance; rather our app maintains a small pool of CommonsHttpSolrServer instances that we

Re: solr equivalent of select distinct

2011-09-11 Thread Michael Sokolov
You can get what you want - unique lists of values from docs matching your query - for a single field (using facets), but not for the co-occurrence of two field values. So you could combine the two fields together, if you know what they are going to be in advance. Facets also give you

Re: solr 1.4 highlighting issue

2011-09-14 Thread Michael Sokolov
The highlighter gives you snippets of text surrounding words (terms) drawn from the query. The whole document should satisfy the query (ie it probably has ships/s somewhere else in it), but each snippet won't generally have all the terms. -Mike On 9/14/2011 2:54 AM, Dmitry Kan wrote: Hello

Re: Lucene-SOLR transition

2011-09-18 Thread Michael Sokolov
On 9/15/2011 8:30 PM, Scott Smith wrote: 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc.) and generate a SOLR query string with q and fq components? I believe that Query.toString() will

Re: Lucene-SOLR transition

2011-09-19 Thread Michael Sokolov
On 9/19/2011 5:27 AM, Erik Hatcher wrote: On Sep 18, 2011, at 19:43 , Michael Sokolov wrote: On 9/15/2011 8:30 PM, Scott Smith wrote: 2. Assuming that the answer to 1 is correct, then is there an easy way to take a lucene query (with nested Boolean queries, filter queries, etc

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Sokolov
I wonder if config-file validation would be helpful here :) I posted a patch in SOLR-1758 once. -Mike On 9/21/2011 6:22 PM, Michael Ryan wrote: I think the problem is that themergePolicy config needs to be inside of the indexDefaults config, rather than after it as your have. -Michael

Re: escaping HTML tags within XML file

2011-09-25 Thread Michael Sokolov
Yes - you can index HTML text only while keeping the tags in place in the stored field using HTMLCharFilter (or possibly XMLCharFilter). But you will find that embedding HTML inside XML can be problematic since HTML tags don't have to follow the well-formed constraints that XML requires. For

Re: In-document highlighting DocValues?

2011-10-16 Thread Michael Sokolov
On 10/14/2011 7:20 PM, Jan Høydahl wrote: Hi, The Highlighter is way too slow for this customer's particular use case - which is veery large documents. We don't need highlighted snippets for now, but we need to accurately decide what words (offsets) in the real HTML display of the resulting

Re: Field Collapsing and Record Filtering

2011-10-16 Thread Michael Sokolov
On 10/13/2011 5:04 PM, lee carroll wrote: current: bool //for fq which searches only current versions last_current_at: date time // for date range queries or group sorting what was current for a given date sorry if i've missed a requirement lee c Lee the idea of last_current_at is

Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-26 Thread Michael Sokolov
Have you checked to see when you are committing? Is the pattern the same in both instances? If you are committing after each delete request in Java, but not in Perl, that could slow things down. On 10/25/2011 5:53 PM, Shawn Heisey wrote: On 10/20/2011 11:00 AM, Shawn Heisey wrote: I've got

Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-27 Thread Michael Sokolov
think that's what it's called) can give better throughput for large request batches. If you're not using that, you may be having problems w/closing and re-opening connections? -Mike On 10/26/2011 9:56 PM, Shawn Heisey wrote: On 10/26/2011 6:16 PM, Michael Sokolov wrote: Have you checked

Re: question from a beginner

2011-10-31 Thread Michael Sokolov
You might also consider indexing each paragraph as a separate document if the documents are very large. -Mike On 10/30/2011 11:51 PM, Phil Scadden wrote: Look up highlighting. http://wiki.apache.org/solr/HighlightingParameters Notice: This email and any attachments are confidential. If

trouble with CollationKeyFilter

2011-11-23 Thread Michael Sokolov
I'm using CollectionKeyFilter to sort my documents using the Unicode root collation, and my documents do appear to be getting sorted correctly, but I'm getting weird results when performing range filtering using the sort key field. For example: ifp_sortkey_ls:[youth culture TO youth culture]

Re: trouble with CollationKeyFilter

2011-11-23 Thread Michael Sokolov
Thanks for confirming that, and laying out the options, Robert. -Mike On 11/23/2011 9:03 PM, Robert Muir wrote: hi, locale sensitive range queries don't work with these filters, only sort, although erick erickson has a patch that will enable this (the lowercasing wildcards patch, then you

Re: trouble with CollationKeyFilter

2011-11-26 Thread Michael Sokolov
That's great news! We can't really track trunk, but it looks like this is targeted for 3.6, right? As a short-term alternative, I was considering using ICUFoldingFilter; this won't preserve some of the finer distinctions, but will at least sort the accented characters in with their unaccented

xml-aware highlighting

2010-10-09 Thread Michael Sokolov
I have a requirement to highlight search results, and to display documents with matching terms highlighted in the context of the original XML document structure. It seems like this must be a very common use case, but I am having trouble finding a way to accomplish what we need to do using

Re: xml-aware highlighting

2010-10-09 Thread Michael Sokolov
is the existing implementation :) - anyone? -Mike On 10/9/2010 12:51 PM, Michael Sokolov wrote: I have a requirement to highlight search results, and to display documents with matching terms highlighted in the context of the original XML document structure. It seems like this must be a very

Re: xml-aware highlighting

2010-10-09 Thread Michael Sokolov
Yes - that looks right; I was thrown a bit by the name - Thanks! On 10/9/2010 5:23 PM, Ahmet Arslan wrote: OK - I read a bit more and it appears an appropriate analysis pipeline (which would extract text from XML using SAX, say) is all that's required, and existing highlighting ought to be

Re: configuring custom CharStream in solr

2010-10-11 Thread Michael Sokolov
On 10/11/2010 6:41 PM, Koji Sekiguchi wrote: (10/10/12 5:57), Michael Sokolov wrote: I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do this in solr using the Analyzer configuration

Re: configuring custom CharStream in solr

2010-10-11 Thread Michael Sokolov
On 10/11/2010 8:38 PM, Michael Sokolov wrote: On 10/11/2010 6:41 PM, Koji Sekiguchi wrote: (10/10/12 5:57), Michael Sokolov wrote: I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do

Re: configuring custom CharStream in solr

2010-10-11 Thread Michael Sokolov
On 10/11/2010 10:18 PM, Chris Hostetter wrote: : OK - I found the answer pecking through the source - apparently the name of : the element to configure a CharFilter ischarFilter - fancy that :) there's even an example, right there on the wiki...

RE: using HTTPClient sending solr ping request wont timeout as specified

2010-10-13 Thread Michael Sokolov
This does seem more like an HTTPClient question than a solr question - you might get more traction on their lists? Still, from what I remember HTTPClient has a number of timeouts you can set. Perhaps it's the read timeout you need? -Mike -Original Message- From: Renee Sun

RE: SOLR DateTime and SortableLongField field type problems

2010-10-18 Thread Michael Sokolov
I think if you look closely you'll find the date quoted in the Exception report doesn't match any of the declared formats in the schema. I would suggest, as a first step, hunting through your data to see where that date is coming from. -Mike -Original Message- From: Ken Stanley

RE: How do I this in Solr?

2010-10-27 Thread Michael Sokolov
You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike -Original Message- From: Varun Gupta [mailto:varun.vgu...@gmail.com] Sent: Tuesday, October

Ensuring stable timestamp ordering

2010-10-28 Thread Michael Sokolov
I'm curious what if any guarantees there are regarding the timestamp field that's defined in the sample solr schema.xml. Just for completeness, the definition is:

RE: Ensuring stable timestamp ordering

2010-10-28 Thread Michael Sokolov
. Is that expected? I could create my own timestamp values easily enough, but would just as soon not do so if I could use a pre-existing feature that seems tailor-made. -Mike -Original Message- From: Michael Sokolov [mailto:soko...@ifactory.com] Sent: Thursday, October 28, 2010 9:55 PM

RE: Influencing scores on values in multiValue fields

2010-10-29 Thread Michael Sokolov
How about creating another field for doing exact matches (a string); searching both and boosting the string match? -Mike -Original Message- From: Imran [mailto:imranboho...@gmail.com] Sent: Friday, October 29, 2010 6:25 AM To: solr-user@lucene.apache.org Subject: Influencing

Re: Ensuring stable timestamp ordering

2010-10-31 Thread Michael Sokolov
Hmm - personally, I wouldn't want to rely on timestamps as a unique-id generation scheme. Might we not one day want to have distributed parallel indexing that merges lazily? Keeping timestamps unique and in sync across multiple nodes would be a tough requirement. I would be happy simply

Re: Query question

2010-11-02 Thread Michael Sokolov
My impression was that city:Chicago^10 +Romantic +View would do what you want (with the standard lucene query parser and default operator OR), and I'm not sure about this, but I have a feeling that the version with Boolean operators AND/OR and parens might actually net out to the same thing,

Re: Tomcat special character problem

2010-11-07 Thread Michael Sokolov
Is it possible that your original search is being posted (HTTP POST), and the character encoding of the page with the form is not UTF-8? In that case, I believe a header gets sent with the request specifying a different character set (different from parameters in the URL, for which it's not

Re: multi-core solr, specifying the data directory

2011-02-28 Thread Michael Sokolov
I spent a few hours chasing my tail on this one; I really just assumed that the core's data dir would be under the core's instance dir. But I ended up doing exactly what you did (copying from something I found on the web). seems like a design flaw that could be difficult to fix without

Re: multi-core solr, specifying the data directory

2011-03-01 Thread Michael Sokolov
I tried this in my 1.4.0 installation (commenting out what had been working, hoping the default would be as you said works in the example): solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=bpro instanceDir=bpro !-- property name=solr.data.dir value=solr/bpro/data/ --

Re: Solr chained exclusion query

2011-03-05 Thread Michael Sokolov
It sounds as if what you have done is to index sales events (with fields customer, product, and date), and now you want to retrieve customers, which are not documents. The most natural way to handle this is to index customers as documents (with fields cust id, last sale date). Whenever a new

Re: True master-master fail-over without data gaps

2011-03-09 Thread Michael Sokolov
Yes, I think this should be pushed upstream - insert a tee in the document stream so that all documents go to both masters. Then use a load balancer to make requests of the masters. The tee itself then becomes a possible single point of failure, but you didn't say anything about the

Re: Fwd: some relational-type groupig with search

2011-03-09 Thread Michael Sokolov
Probably you can just sort by date (one way and then the other) and limit your result set to a single document. That should free up enough budget for the bonuses of the highly-placed people, I think :) On 3/9/2011 4:05 PM, l.blev...@comcast.net wrote: - Forwarded Message - From: l

Re: Matching on a multi valued field

2011-04-05 Thread Michael Sokolov
Could you try creating fields dynamically: common_names_1, common_names_2, etc. Keep track of the max number of fields and generate queries listing all the fields? Gross, but it handles all the cases mentioned in the thread (wildcards, phrases, etc). -Mike On 3/29/2011 4:57 PM, Brian

Re: XML not coming through from nabble to Gmail

2011-04-11 Thread Michael Sokolov
I see the same problem (missing markup) in Thunderbird. Seems like Nabble might be the culprit? -Mike On 4/11/2011 8:13 AM, Erick Erickson wrote: All: Lately I've been seeing a lot of posts where people paste in parts of their schema.xml or solrconfig.xml and the results

Re: updates not reflected in solr admin

2011-05-02 Thread Michael Sokolov
No - this is all running against an external tomcat-based solr. I'm back to being mystified now. Maybe I'll see if I can isolate this a bit more. I'll post back if I do, although I'm beginning to wonder if we should just move to 3.1 and not worry about it. -Mike On 5/2/2011 8:39 PM,

Re: updates not reflected in solr admin

2011-05-02 Thread Michael Sokolov
Right I read those comments in the config, and it all sounds reasonable - presumably a new Searcher is opened when (or shortly after) we commit, from whatever source. That was my operating assumption, and the reason I was so confused when I saw different result in two different clients. I

Re: [POLL] How do you (like to) do logging with Solr

2011-05-17 Thread Michael Sokolov
On 5/16/2011 7:50 PM, Chris Hostetter wrote: : This poll is to investigate how you currently do or would like to do : logging with Solr when deploying solr.war to a SEPARATE java application : server (such as Tomcat, Resin etc) outside of the bundled FWIW... a) the context of this poll is

Re: Debugging a Solr/Jetty Hung Process

2011-06-02 Thread Michael Sokolov
If you have an SNMP infrastructure available (nagios or similar) you should be able to set up a polling monitor that will keep statistics on the number of threads in your jvm and even allow you to inspect their stacks remotely. You can set alarms so you will be notified if cpu thread count

Re: Index vs. Query Time Aware Filters

2011-06-02 Thread Michael Sokolov
It doesn't look like this is supported in any way that is at all straightforward. http://wiki.apache.org/solr/SolrPlugins talks about the easy ways to parameterize plugins, and they don't include what you're after. I think maybe you could extend the query parser you are currently using,

Re: Newbie question: how to deal with different # of search results per page due to pagination then grouping

2011-06-02 Thread Michael Sokolov
Just keep one extra facet value hidden; ie request one more than you need to show the current page. If you get it, there are more (show the next button), otherwise there aren't. You can't page arbitrarily deep like this, but you can have a next button reliably enabled or disabled. On

Re: Pattern: Is there a method of resolving multivalued date ranges into a single document?

2011-06-11 Thread Michael Sokolov
Juidoo - there's no field wildcarding in Solr as your example shows. You might want to consider building a document for each movie time that includes all the information you need to search on: times, movie name, and other details. Otherwise you need a join operation to search across related

Re: High 100% CPU usage with SOLR 1.4.1

2011-06-15 Thread Michael Sokolov
Or another way of saying this is - what is the maximum throughput you get from the system (qps / indexing speed, etc) since that is what you really (should) care about - and how does it compare to the previous setup? -Mike On 6/15/2011 3:52 PM, Erick Erickson wrote: Yes, 100% CPU utilization

Re: paging and maintaingin a cursor just like ScrollableResultSet

2011-06-19 Thread Michael Sokolov
One technique I've used to page through huge result sets that could help: if you have a sortable key (like an id), you can just fetch all docs, sorted by the key, and then on subsequent page requests use the last value from the previous page as a filter in a range term like: id:[last-id TO *]

Re: Extending Solr Highlighter to pull information from external source

2011-06-20 Thread Michael Sokolov
I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not much going on there LUCENE-1522 https://issues.apache.org/jira/browse/LUCENE-1522has a lot of fascinating discussion on this topic though There is a couple of long lived issues in jira for this (I'd like to try to

Re: TermVectors and custom queries

2011-07-01 Thread Michael Sokolov
I think that's all you can do, although there is a callback-style interface that might save some time (or space). You still need to iterate over all of the vectors, at least until you get the one you want. -Mike On 6/30/2011 4:53 PM, Jamie Johnson wrote: Perhaps a better question, is this

Re: Match only documents which contain all query terms

2011-07-02 Thread Michael Sokolov
I believe you should be able to get results ordered so that the documents you want will always come first, so you can truncate the results efficiently on the client side. You could also try a regexp query (untested): a b c -/~(a|b|c)/ -Mike On 7/1/2011 7:50 PM, Spyros Kapnissis wrote:

Re: How do I add a custom field?

2011-07-03 Thread Michael Sokolov
You'll need to index the field. I would think you would want to index/store the field along with the associated document, in which case you'll have to reindex the documents as well - there's no single-field update capability in Lucene (yet?). -Mike On 7/3/2011 1:09 PM, Gabriele Kahlout

Re: Preserve XML hierarchy

2011-07-14 Thread Michael Sokolov
Have a look at http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor It might be just what you need? -Mike On 7/14/2011 3:31 AM, Lucas Miguez wrote: Hi, yes, I was asking about it, is it possible to index an XML file? Is it possible to know which node of the XML the search

Re: XInclude Multiple Elements

2011-07-21 Thread Michael Sokolov
The various XInclude specs were never really fully implemented by XML parsers. IMO it's really best for including whole XML files. If I remember right, the situation is that the xpointer() scheme (the most flexible) wasn't implemented. There are two other schemes for addressing content

discovery-based core enumeration with embedded solr

2013-03-13 Thread Michael Sokolov
) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:989) ... 11 more even though I have a solr.properties file in solr-multi (which is my solr.home), and core.properties in some subdirectories of that -- Michael Sokolov Senior Architect

Re: discovery-based core enumeration with embedded solr

2013-03-15 Thread Michael Sokolov
not be straightforward... Thanks, Erick On Wed, Mar 13, 2013 at 5:28 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Has the new core enumeration strategy been implemented in the CoreContainer.Initializer.**initialize() code path? It doesn't seem like it has. I get this exception: Caused

Re: discovery-based core enumeration with embedded solr

2013-03-16 Thread Michael Sokolov
On 3/16/2013 9:52 AM, Erick Erickson wrote: A, good catch! Coincidentally yesterday while in the midst of looking at some other JIRAs, I noticed that some pages on the Wiki said 4.2 and changed what I ran across to 4.3. I originally started the Wikis when I though I would go fast enough to

Re: iterate through each document in Solr

2013-05-05 Thread Michael Sokolov
On 5/5/13 7:48 PM, Mingfeng Yang wrote: Dear Solr Users, Does anyone know what is the best way to iterate through each document in a Solr index with billion entries? I tried to use select?q=*:*start=xxrows=500 to get 500 docs each time and then change start value, but it got very slow after

Solr 4.x/3.x update javabin incompatibility?

2013-05-10 Thread Michael Sokolov
I upgraded one of my solrj clients to 4.2.0, and am testing using it with a 3.4 server. We generally use a BinaryRequestWriter (ie javabin). With the 3.4 solrj client, this caused update requests to be directed to /update/javabin. However, in 4.2, the dispatch seems to be getting handled

Re: Solr 4.x/3.x update javabin incompatibility?

2013-05-10 Thread Michael Sokolov
- is there any way to do it without patching SolrJ? On 5/10/2013 9:42 PM, Michael Sokolov wrote: I upgraded one of my solrj clients to 4.2.0, and am testing using it with a 3.4 server. We generally use a BinaryRequestWriter (ie javabin). With the 3.4 solrj client, this caused update requests

Re: Solr 4.x/3.x update javabin incompatibility?

2013-05-10 Thread Michael Sokolov
On 5/10/2013 10:18 PM, Shawn Heisey wrote: On 5/10/2013 7:42 PM, Michael Sokolov wrote: My question is: is this intentional? It's unfortunate that we don't seem to be able to update the client and have it continue to work with (ie send updates to) the old servers. We have a centralized client

Re: Solr 4.x/3.x update javabin incompatibility?

2013-05-10 Thread Michael Sokolov
On 5/10/2013 10:18 PM, Shawn Heisey wrote: I don't know why I'm not having any trouble. I'm certainly glad that I'm not, though! Thanks, Shawn Shawn, one question - in your server setup do you have: _querySolr.setRequestWriter(new BinaryRequestWriter()); ? I didn't see that - it (used to

Re: Solr 4.x/3.x update javabin incompatibility?

2013-05-11 Thread Michael Sokolov
On 5/10/2013 11:39 PM, Shawn Heisey wrote: On 5/10/2013 8:56 PM, Michael Sokolov wrote: On 5/10/2013 10:18 PM, Shawn Heisey wrote: I don't know why I'm not having any trouble. I'm certainly glad that I'm not, though! Thanks, Shawn Shawn, one question - in your server setup do you have

Re: Solr 4.x/3.x update javabin incompatibility?

2013-05-11 Thread Michael Sokolov
On 5/11/2013 11:14 AM, Steve Rowe wrote: On May 11, 2013 7:27 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: If somebody grants me access to the wiki, I'd be happy to write something there to let people know about this issue. What's your wiki username? sokolov

Re: Solr 4.x/3.x update javabin incompatibility?

2013-05-11 Thread Michael Sokolov
On 5/11/2013 11:31 AM, Michael Sokolov wrote: On 5/11/2013 11:14 AM, Steve Rowe wrote: On May 11, 2013 7:27 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: If somebody grants me access to the wiki, I'd be happy to write something there to let people know about this issue. What's

Re: Looking to see if solrj 3.5 could be used with solr server 4.2.1

2013-05-12 Thread Michael Sokolov
On 5/11/2013 11:36 PM, Lee, Peter wrote: If you have any information regarding whether or not this might work (as in yeah, we did that and it worked okay...or...no, that won't work because protocol XYZ changed between versions and yada,yada,yada) I would appreciate it. As stated above, simple

Re: Reindexing strategy

2013-05-30 Thread Michael Sokolov
On 5/30/2013 8:30 AM, Dotan Cohen wrote: On Wed, May 29, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote: It's impossible for us to give you hard numbers. You'll have to experiment to know how fast you can reindex without killing your servers. A basic tenet for such experimentation, and

Re: How can a Tokenizer be CoreAware?

2013-06-03 Thread Michael Sokolov
Benson, I think the idea is that Tokenizers are created as needed (from the TokenizerFactory), while those other objects are singular (one created for each corresponding stanza in solrconfig.xml). So Tokenizers should be short-lived; they'll be cleaned up after each use, and the assumption is

Re: Solr + Groovy

2013-06-03 Thread Michael Sokolov
On 6/3/13 3:07 AM, Achim Domma wrote: Hi, I have some query building and result processing code, which is currently running as normal Solr client outside of Solr. I think it would make a lot of sense to move parts of this code into a custom SearchHandler or SearchComponent. Because I'm not a

Re: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-15 Thread Michael Sokolov
If you have very large documents (many MB) that can lead to slow highlighting, even with FVH. See https://issues.apache.org/jira/browse/LUCENE-3234 and try setting phraseLimit=1 (or some bigger number, but not infinite, which is the default) -Mike On 6/14/13 4:52 PM, Andy Brown wrote:

[ANN] Lux XML search engine

2013-06-18 Thread Michael Sokolov
I'm pleased to announce the first public release of Lux (version 0.9.1), an XML search engine embedding Saxon 9 and Lucene/Solr 4. Lux offers many features found in XML databases: persistent XML storage, index-optimized querying, an interactive query window, and some application support

Re: [ANN] Lux XML search engine

2013-06-18 Thread Michael Sokolov
On 06/18/2013 09:20 AM, Alexandre Rafalovitch wrote: On Tue, Jun 18, 2013 at 7:44 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: I'm pleased to announce the first public release of Lux (version 0.9.1), an XML search engine embedding Saxon 9 and Lucene/Solr 4

Re: Is there a way to capture div tag by id?

2013-06-26 Thread Michael Sokolov
, including //div[@id='myDiv'], for example. -- Michael Sokolov Senior Architect Safari Books Online

Re: [POLL] Who how does use admin-extra ?

2013-08-08 Thread Michael Sokolov
On 8/7/13 10:56 PM, Chris Hostetter wrote: : Didn't somebody once say this is used for customization of admin pages? it can be yes, that's why it originla existed -- Stefan's question was wether anyone was actually using it for that. I used it quite a bit back in the day at CNET as a way to

Re: Indexing an XML file in Apache Solr

2013-08-18 Thread Michael Sokolov
You might be interested in trying Lux, which is a Solr extension that indexes XML documents using the element and attribute names and the contents of those nodes in your document. It also allows you to define XPath indexes (like DIH, I think, but with the full XPath 2.0 syntax), and to query

Re: Indexing an XML file in Apache Solr

2013-08-19 Thread Michael Sokolov
. Now what I have inferred is that I need to format my xml to fit the format of Solr. Now do I have to manually code it or do i have some kind of parser on which the xml if fed is formatted to the Solr version? I couldnt find any code examples in Lux. On Sun, Aug 18, 2013 at 11:20 PM, Michael Sokolov

Re: Different Responses for 4.4 and 3.5 solr index

2013-08-28 Thread Michael Sokolov
We've been seeing changes in our rankings as well. I don't have a definite answer yet, since we're waiting on an index rebuild, but our current working theory is that the change to default omitNorms=true for primitive types may have had an effect, possibly due to follow on confusion: our

distributed query result order tie break question

2013-09-02 Thread Michael Sokolov
My question is about how query results are ordered in a distributed query when sorting by relevance and all the documents have the same score, for example, when querying for *:*. It looks to me as if score ties are broken by shard and then within each shard, by docid. So for example, if I

Re: distributed query result order tie break question

2013-09-02 Thread Michael Sokolov
that behavior? -- Jack Krupansky -Original Message- From: Michael Sokolov Sent: Monday, September 02, 2013 7:42 PM To: solr-user@lucene.apache.org Subject: distributed query result order tie break question My question is about how query results are ordered in a distributed query when sorting

Re: distributed query result order tie break question

2013-09-03 Thread Michael Sokolov
On 09/03/2013 12:50 PM, Chris Hostetter wrote: : like to understand how the ordering is defined so that I can compute an : integer that is sorted in the same way. For example (shard id 24) | : docid or something like that. If you want to ensure a consistent ordering, you have to index a

Re: No or limited use of FieldCache

2013-09-11 Thread Michael Sokolov
On 9/11/13 3:11 AM, Per Steffensen wrote: Hi We have a SolrCloud setup handling huge amounts of data. When we do group, facet or sort searches Solr will use its FieldCache, and add data in it for every single document we have. For us it is not realistic that this will ever fit in memory and

Re: No or limited use of FieldCache

2013-09-11 Thread Michael Sokolov
On 09/11/2013 08:40 AM, Per Steffensen wrote: The reason I mention sort is that we in my project, half a year ago, have dealt with the FieldCache-OOM-problem when doing sort-requests. We basically just reject sort-requests unless they hit below X documents - in case they do we just find them

Re: Solr Patent

2013-09-14 Thread Michael Sokolov
On 9/13/2013 9:14 PM, Zaizen Ushio wrote: Hello I have a question about patent. I believe Apache license is protecting Solr developers from patent issue in Solr community. But is there any case that Solr developer or Solr users are alleged by outside of Solr Community? Is there any cases

[ANN] Lux Release 0.10.5

2013-09-19 Thread Michael Sokolov
I'm pleased to announce the release of the XML search engine Lux, version 0.10.5. There has been a lot of progress made since our last announced release, which was 0.9.1. Some highlights: The app server now provides full access to HTTP request data and control of HTTP responses. We've

Re: Pubmed XML indexing

2013-09-27 Thread Michael Sokolov
You might be interested in Lux (http://luxdb.org), which is designed for indexing and querying XML using Solr and Lucene. It can run index-supported XPath/XQuery over your documents, and you can define arbitrary XPath indexes. -Mike On 9/27/13 6:28 AM, Francisco Fernandez wrote: Hi, I'm a

Re: Nagle's Algorithm

2013-09-29 Thread Michael Sokolov
I dunno, but this makes it look as if this may already be taken care of: http://jira.codehaus.org/browse/JETTY-1196 On 9/29/2013 9:22 PM, William Bell wrote: How do I set TCP_NODELAY on the http sockets for Jetty in SOLR 4? Is there an option in jetty.xml ? /* Create new stream socket */

Re: App server?

2013-10-03 Thread Michael Sokolov
On 10/02/2013 06:44 PM, Mark wrote: Is Jetty sufficient for running Solr or should I go with something a little more enterprise like tomcat? Any others? FWIW we use tomcat for all of our installs, and it works fine. I don't claim it's any better than Jetty, but it doesn't cause any problems,

Re: limiting deep pagination

2013-10-09 Thread Michael Sokolov
On 10/8/13 6:51 PM, Peter Keegan wrote: Is there a way to configure Solr 'defaults/appends/invariants' such that the product of the 'start' and 'rows' parameters doesn't exceed a given value? This would be to prevent deep pagination. Or would this require a custom requestHandler? Peter Just

Re: Solr - what's the next big thing?

2013-10-29 Thread Michael Sokolov
On 10/26/2013 8:31 PM, Bill Bell wrote: Full JSON support deep complex object indexing and search Game changer Bill Bell Sent from mobile Not JSON (yet?) but take a look at http://luxdb.org which does XML indexing and search. We index all the text of all the nodes in your tree: no

[ANN] Lux release 0.11.2

2013-11-05 Thread Michael Sokolov
I'm pleased to announce the release of Lux, version 0.11.2, the Dublin edition. There have been the usual round of bug fixes and enhancements, but the main news with this release is the inclusion of support for SolrCloud. You can now store and search XML documents in a distributed index

Re: character encoding issue...

2013-11-10 Thread Michael Sokolov
Don't feel bad: character encoding problems are often said to be among the hardest in software engineering. There's no simple answer to problems like this since as Erick said, any tool in your chain could be the culprit. I doubt anyone on this list will be able to guess the answer since the

  1   2   3   >