Re: Tokenizing managed synonyms
I think the question makes sense as SynonymGraphFilterFactory accepts tokenizerFactory, he asked the managed version of SynonymGraphFilter could accept it as well. https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter The answer seems to be NO. Koji On 2020/07/07 8:18, Erick Erickson wrote: This question doesn’t really make sense. You don’t specify tokenizers on filters, they’re specified at the _field_ level. You can certainly define as many field(type)s as you want, each with a different analysis chain and those chains can be made up of whatever you want to use, and there are lots of choices. If you are asking to do _additional_ tokenization on the output of a synonym filter, no. Perhaps if you defined the problem you’re trying to solve we could make some suggestions. Best, Erick On Jul 6, 2020, at 6:43 PM, Thomas Corthals wrote: Hi, Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on some fields. Best, Thomas
per field mm
Hi, I have a use case that one of our customers wants to set different mm parameter per field, as in some fields of qf, unexpectedly many terms are produced because they are N-gram fields while in other fields, few terms are produced because they are normal text fields. If it is reasonable, I want to add per field mm feature. What do you think about this? And if there is existing jira, let me know. Thanks, Koji
Re: Implementing NeuralNetworkModel RankNet in Solr LTR
Hi Edwin, > Just to check, is this supported in Solr 7.4.0? Yes, it is. https://github.com/LTR4L/ltr4l/blob/master/ltr4l-solr/ivy-jars.properties#L17 Koji On 2018/09/19 19:40, Zheng Lin Edwin Yeo wrote: Hi Koji, Thanks for your reply and provide the information. Just to check, is this supported in Solr 7.4.0? Regards, Edwin On Wed, 19 Sep 2018 at 11:02, Koji Sekiguchi wrote: Hi, > https://github.com/airalcorn2/Solr-LTR#RankNet > > Has anyone tried on this before? And what is the format of the training > data that this model requires? I haven't tried it, but I'd like to inform you that there is another project of LTR we've been developed: https://github.com/LTR4L/ltr4l It has many LTR algorithms based on neural network, SVM and boosting. Koji On 2018/09/12 11:44, Zheng Lin Edwin Yeo wrote: Hi, I am working on to implementing Solr LTR in Solr 7.4.0 by using the NeuralNetworkModel for the feature selection and model training, and I have found this site which uses RankNet: https://github.com/airalcorn2/Solr-LTR#RankNet Has anyone tried on this before? And what is the format of the training data that this model requires? Regards, Edwin
Re: Implementing NeuralNetworkModel RankNet in Solr LTR
Hi, > https://github.com/airalcorn2/Solr-LTR#RankNet > > Has anyone tried on this before? And what is the format of the training > data that this model requires? I haven't tried it, but I'd like to inform you that there is another project of LTR we've been developed: https://github.com/LTR4L/ltr4l It has many LTR algorithms based on neural network, SVM and boosting. Koji On 2018/09/12 11:44, Zheng Lin Edwin Yeo wrote: Hi, I am working on to implementing Solr LTR in Solr 7.4.0 by using the NeuralNetworkModel for the feature selection and model training, and I have found this site which uses RankNet: https://github.com/airalcorn2/Solr-LTR#RankNet Has anyone tried on this before? And what is the format of the training data that this model requires? Regards, Edwin
Re: Return only matched multi-valued field
Hi, I don't think Lucene/Solr can know which field matches the query you posted. You should usually use Highlighter to know it. Koji On 2017/08/22 2:46, ruby wrote: Is there a way to return only the matched field from a multivalued field using filtering? -- View this message in context: http://lucene.472066.n3.nabble.com/Return-only-matched-multi-valued-field-tp4351494.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issues trying to boost phrase containing stop word
Hi Shamik, I'm sorry but I don't understand why you use KeywordRepeatFilter. I think it's normal to create separate fields to solve this kind of problems. Why don't you have another separate field which has ShingleFilter as I mentioned in the previous reply? Koji On 2017/07/20 12:13, shamik wrote: Thanks Koji, I've tried KeywordRepeatFilterFactory which keeps the original term, but the Stopword filter in the analysis chain will remove it nonetheless. That's why I thought of creating a separate field devoiding of stopwords/stemmers. Let me know if I'm missing something here. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-trying-to-boost-phrase-containing-stop-word-tp4346860p4346909.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issues trying to boost phrase containing stop word
Hi Shamik, How about using ShingleFilter which constructs token n-grams from a token stream? http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html As for "about dynamic block", ShingleFilter produces "about dynamic" and "dynamic block". Thanks, Koji On 2017/07/20 5:54, Shamik Bandopadhyay wrote: Hi, I'm trying to show titles with exact query phrase match at the top of the result. That includes supporting stop words as part of the phrase. For e.g. if I'm using "about dynamic "block" , I expect the title with "About Dynamic Blocks" to appear at the top. Since the title field uses stoprword filter factory as part of its analysis chain, I decided to create a copyfield of title and use that in search with a higher boost. That didn't seem to work either. Although it brought back the expected document at the top, it excluded documents with title "Dynamic Block Grip Reference", to be precise content which doesn't have "about" in title or subject. Even setting the default operator to OR didn't make any difference. Here's the entry from config. Request handler: explicit velocity browse layout Solritas AND edismax title^5 titleExact^15 subject^3 description^2 100% *:* 10 *,score Sample data: SOLR1000 About Dynamic Blocks Dynamic blocks contain rules, or parameters, for how to change the appearance of the block reference when it is inserted in the drawing. With dynamic blocks you can insert one block that can change shape, size, or configuration instead of inserting one of many static block definitions. For example, instead of creating multiple interior door blocks of different sizes, you can create one resizable door block. You author dynamic blocks with either constraint parameters or action parameters. Note: Using both constraint parameters and action parameters in the same block definition is not recommended. Add Constraints In a block definition, constraint parameters Associate objects with one another Restrict geometry or dimensions The following example shows a block reference with a constraint (in gray) and a constraint parameter (blue with grip). Once the block is inserted into the drawing, the constraint parameters can be edited as properties by using the Properties palette. Add Actions and Parameters In a block definition, actions and parameters provide rules for the behavior of a block once it is inserted into the drawing. Depending on the specified block geometry or parameter, you can associate an action to that parameter. The parameter is represented as a grip in the drawing. When you edit the grip, the associated action determines what will change in the block reference. Like constraint parameters, action parameters can be changed using the Properties palette. Dynamic blocks contain rules, or parameters, for how to change the appearance of the block reference when it is inserted in the drawing. SOLR1001 About Creating Dynamic Blocks This table gives an overview of the steps required add behaviors that make blocks dynamic. Plan the block content. Know how the block should change or move, and what parts will depend on the others. Example: The block will be resizable, and after it is resized, additional geometry is displayed. Draw the geometry. Draw the block geometry in the drawing area or the Block Editor. Note: If you will use visibility states to change how geometry is displayed, you may not want to include all the geometry at this point. Add parameters. Add either individual parameters or parameter sets to define geometry that will be affected by an action or manipulation. Keep in mind the objects that will be dependent on one another. Add actions. If you are working with action parameters, if necessary, add actions to define what will happen to the geometry when it is manipulated. Define custom properties. Add properties that determine how the block is displayed in the drawing area. Custom properties affect grips, labels, and preset values for block geometry. Test the block. On the ribbon, in the Block Editor contextual tab, Open/Save panel, click Test Block to test the block before you save it. This table gives an overview of the steps required add behaviors that make blocks dynamic. SOLR1002 About Modifying Dynamic Block Definitions Use the Block Editor to edit, correct, and save a block definition. Correct Errors in Action Parameters A yellow alert icon ( ) is displayed when A parameter is not associated with an action An action is not associated with a parameter or selection set To correct these errors, hover over the yellow alert icon until the tooltip displays a description of the problem. Then double-click the constraint and follow the prompts. Save Dynamic Blocks When you save a block definition, the current values of the geometry and parameters in the
Re: Is there any particular reason why ExternalFileField is read from data directory
Hi, ExternalFileField was introduced via SOLR-351. https://issues.apache.org/jira/browse/SOLR-351 The author thought values could optionally be updated often... I think it describes why it is read from not config, but datadir. Koji On 2017/06/29 17:17, apoorvqwerty wrote: Hi, As per the documentation for ExternalFileField we need to put external_field with the map in parallel with the data directory on all the shards. Is it possible to read the file from a central location or zookeeper? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-any-particular-reason-why-ExternalFileField-is-read-from-data-directory-tp4343374.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering results by minimum relevancy score
Hi Walter, May I ask a tangential question? I'm curious the following line you wrote: > Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just not as good as vector-space engines. So, probabilistic engines are mostly dead. Can you elaborate this? I thought Okapi BM25, which is the default Similarity on Solr, is based on the probabilistic model. Did you mean that Lucene/Solr is still based on vector space model but they built BM25Similarity on top of it and therefore, BM25Similarity is not pure probabilistic scoring system or Okapi BM25 is not originally probabilistic? As for me, I prefer the idea of vector space than probabilistic for the information retrieval, and I stick with ClassicSimilarity for my projects. Thanks, Koji On 2017/04/13 4:08, Walter Underwood wrote: Fine. It can’t be done. If it was easy, Solr/Lucene would already have the feature, right? Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just not as good as vector-space engines. So, probabilistic engines are mostly dead. But, “you don’t want to do it” is very good advice. Instead of trying to reduce bad hits, work on increasing good hits. It is really hard, sometimes not possible, to optimize both. Increasing the good hits makes your customers happy. Reducing the bad hits makes your UX team happy. Here is a process. Start collecting the clicks on the search results page (SRP) with each query. Look at queries that have below average clickthrough. See if those can be combined into categories, then address each category. Some categories that I have used: * One word or two? “babysitter”, “baby-sitter”, and “baby sitter” are all valid. Use synonyms or shingles (and maybe the word delimiter filter) to match these. * Misspellings. These should be about 10% of queries. Use fuzzy matching. I recommend the patch in SOLR-629. * Alternate vocabulary. You sell a “laptop”, but people call it a “notebook”. People search for “kids movies”, but your movie genre is “Children and Family”. Use synonyms. * Missing content. People can’t find anything about beach parking because there isn’t a page about that. Instead, there are scraps of info about beach parking in multiple other pages. Fix the content. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 12, 2017, at 11:44 AM, David Kramerwrote: The idea is to not return poorly matching results, not to limit the number of results returned. One query may have hundreds of excellent matches and another query may have 7. So cutting off by the number of results is trivial but not useful. Again, we are not doing this for performance reasons. We’re doing this because we don’t want to show products that are not very relevant to the search terms specified by the user for UX reasons. I had hoped that the responses would have been more focused on “it’ can’t be done” or “here’s how to do it” than “you don’t want to do it”. I’m still left not knowing if it’s even possible. The one concrete answer of using frange doesn’t help as referencing score in either the q or the fq produces an “undefined field” error. Thanks. On 4/11/17, 8:59 AM, "Dorian Hoxha" wrote: Can't the filter be used in cases when you're paginating in sharded-scenario ? So if you do limit=10, offset=10, each shard will return 20 docs ? While if you do limit=10, _score<=last_page.min_score, then each shard will return 10 docs ? (they will still score all docs, but merging will be faster) Makes sense ? On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti
Re: Classify document using bag of words
Hi, I'm not sure that it can help you but I'd like to show you the link of an article which I wrote about document classification years ago: Comparing Document Classification Functions of Lucene and Mahout http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html Thanks! -- koji On 2017/03/27 1:05, marotosg wrote: Hi, I have a very simple use case where I would need to classify a document using a bag of words. Basically if a field within the document contains any of the words on my bag then I use a new field to assign a category to the document. Is this something achievable on Solr? I was thinking on using Lucene Document classificationhttps://wiki.apache.org/solr/SolrClassification. From what I understand I need to feed already the category on some documents. New documents would be classified. Is there anything else I can't find? Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Classify-document-using-bag-of-words-tp4326865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query/Field Index Analysis corrected but return no docs in search
Hi Peter, I'm not sure if I can correctly see the result you attached, I think it sounds reasonable to me that you couldn't get search result, because your query 均匀肤色 is used as it is without being analyzed whereas the same string 均匀肤色 is tokenized as 均匀 匀肤 肤色 in the index. So it is obvious that tokenizers you're using in indexing and querying time don't match. Please check what tokenizers you're using in your schema.xml. Thanks, koji On 2017/02/04 23:18, Peter Liu wrote: hi all: I was using solr 3.6 and tried to solve a recall-problem today , but encountered a weird problem. There's doc with field value : 均匀肤色, (just treated that word as a symbol if you don't know it, I just want to describe the problem as exact as possible). And below was the analysis result ( tokenization) : Inline image 2 ( and text-version if need. Index Analyzer 均匀肤色均匀 匀肤 肤色 均匀肤色均匀 匀肤 肤色 均匀肤色均匀 匀肤 肤色 Query Analyzer 均匀肤色 均匀肤色 均匀肤色 均匀肤色 The tokenization result indicate the query will recall/hit the doc undoubtedly. But the doc did not appear in the result if I search with "均匀肤色". I tried to simplify the qf/bf/fq/q, just test it with single field and single document, to make sure it was not caused by other problems but failed. It's knotty to debug because it only reproduced in product environments, I tried same config/index/query but not produce in dev environment. I'm here ask for helps if you met similar problem, or any clues/debug-method will be really helped.
Re: How to train the model using user clicks when use ltr(learning to rank) module?
Hi, NLP4L[1] has not only Learning-to-Rank module but also a module which calculates click model and converts it into pointwise annotation data. NLP4L has a comprehensive manual[2], but you may want to read "Click Log Analysis" section[3] first to see if it suits your requirements. Hope this helps. Thanks! Koji -- T: @kojisays [1] https://github.com/NLP4L/nlp4l [2] https://github.com/NLP4L/manuals [3] https://github.com/NLP4L/manuals/blob/master/ltr/ltr_import.md On 2017/01/05 17:02, Jeffery Yuan wrote: Thanks very much for integrating machine learning to Solr. https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md In the Assemble training data part: the third column indicates the relative importance or relevance of that doc Could you please give more info about how to give a score based on what user clicks? I have read https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html But still have no clue how to translate the partial pairwise feedback to the importance or relevance of that doc. From a user's perspective, the steps such as setup the feature and model in Solr is simple, but collecting the feedback data and train/update the model is much more complex. It would be great Solr can provide some detailed instruction or sample code about how to translate the partial pairwise feedback and use it to train and update model. Thanks again for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-train-the-model-using-user-clicks-when-use-ltr-learning-to-rank-module-tp4312462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: I cannot get phrases highlighted correctly without using the Fast Vector highlighter
Hello Panagiotis, I'm sorry but it's a feature. As for hl.usePhraseHighlighter parameter, when you turn off it, you may get only foo or bar highlighted in your snippets. Koji On 2016/09/18 15:55, Panagiotis T wrote: I'm using Solr 6.2 (tried with 6.1 also) I created a new core and the only change I made is adding the following line in my schema.xml I've indexed two simple xml files. Here's a sample: foo bar foo bar I'm executing a simple query: http://localhost:8983/solr/test/select?hl.fl=body_text_en=on=on=%22foo%20bar%22=json And here is the response: "response":{"numFound":2,"start":0,"docs":[ { "id":"foo bar", "body_text_en":["foo bar"], "_version_":1545790848171507712}, { "id":"foo bar2", "body_text_en":["I strongly suspect that foo bar"], "_version_":1545790848184090624}] }, "highlighting":{ "foo bar":{ "body_text_en":["foo bar"]}, "foo bar2":{ "body_text_en":["I strongly suspect that foo bar"]}}} If I append hl.useFastVectorHighlighter=true to my query the highlighter correctly highlights the phrase as foo bar. Of course I've tried explicitly appending hl.usePhraseHighlighter=true to my query but I get the same result. I would like to get the same result with the standard highlighter if possible. Regards
Re: Query Elevation
Hello, I'm curious, why do you want the particular document to place second, not top, of the result for a particular query? Sorry this isn't the answer for your question, but I think you can implement it rather easy if you study the existing query elevation. Koji On 2016/07/08 19:59, Swathika wrote: A new requirement to get particular document as second result in result page. For example, If the query is “coal”, this document(id: 222) should come as second result. Please let me know if you have any solution. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Elevation-tp4286332.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FW: Difference Between Tokenizer and filter
Hi, ... must have one and only one and it can have zero or more s. From the point of view of the rules, your ... is not correct because it has more than one and ... is not correct as well because it has no . Koji On 2016/03/02 20:25, G, Rajesh wrote: Hi Team, Can you please clarify the below. My understanding is tokenizer is used to say how the content should be indexed physically in file system. Filters are used to query result. The blow lines are from my setup. But I have seen eg that include filters inside and tokenizer in that confused me. My goal is to user solr and find the best match among the technology names e.g Actual tech name 1. Microsoft Visual Studio 2. Microsoft Internet Explorer 3. Microsoft Visio When user types Microsoft Visal Studio user should get Microsoft Visual Studio. Basically misspelled and jumble words should match closest tech name Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. This e-mail and/or its attachments are intended only for the use of the addressee(s) and may contain confidential and legally privileged information belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer SHL Talent Measurement products and services. If you have received this e-mail in error, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited.
Re: Help With Phrase Highlighting
Hi Teague, I couldn't understand the part of "document size" in your question, but if you'd like Solr to return snippet My search phrase instead of My search phrase you should use FastVectorHighlighter. In case use of FVH, your highlight field (hl.fl=text) need to be indexed with options termVectors=true, termPositions=true and termPositions=true. Good luck! Koji On 2015/12/02 5:36, Teague James wrote: Hello everyone, I am having difficulty enabling phrase highlighting and am hoping someone here can offer some help. This is what I have currently: Solr 4.9 solrconfig.xml (partial snip) xml explicit 10 text on text html 100 schema.xml (partial snip) Query (partial snip): ...select?fq=id:43040="my%20search%20phrase" Response (partial snip): ... ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta assentior. (my search phrase facilitates highlighting). Et option molestiae referrentur ius. Viris quaeque legimus an pri The document in which this phrase is found is very long. If I reduce the document to a single sentence, such as "My search phrase facilitates highlighting" then the response I get from Solr is: My search phrase facilitates highlighting What I am trying to achieve instead, regardless of the document size is: My search phrase with a single indicator at the beginning and end rather than three separate words that may get dsitributed between two different snippets depending on the placement of the snippet in te larger document. I tried to follow this guide: http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole- search-phrase-only/25970452#25970452 but got zero results. I suspect that this is due to the hl parameters in my solrconfig file, but I cannot find any specific guidance on the correct parameters should be. I tried commenting out all of the hl parameters and also got no results. Can anyone offer any solutions for searching large documents and returning a single phrase highlight? -Teague
Re: Tokenize ShingleFilterFactory results and apply filters to tokens
Hi Vitaly, I'm not sure I understand you correctly, why don't you put EdgeNGramFilter just after ShingleFilter? That is: Koji On 2015/10/15 22:47, vitaly bulgakov wrote: I want to rephrase my question I asked in another post. As far as I understand filter ShingleFilterFactory creates shingle as strings. But I want to apply more filters (like EdgeNgrams) to each token of a shingle. For example from "Home Improvement Service" I have two shingles: "Home Improvement" and "Improvement Service". I want to apply EdgeNgram to be able to do exact match to: "Hom Improvem" and "Improvemen Servi" as new phrases. Any, help, ideas are welcomed and appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenize-ShingleFilterFactory-results-and-apply-filters-to-tokens-tp4234574.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting
Hi Mark, I think I saw similar requirement recently in mailing list. The feature sounds reasonable to me. > If not, how do I go about posting this as a feature request? JIRA can be used for the purpose, but there is no guarantee that the feature is implemented. :( Koji On 2015/10/01 20:07, Mark Fenbers wrote: Yeah, I thought about using markers, but then I'd have to search the the text for the markers to determine the locations. This is a clunky way of getting the results I want, and it would save two steps if Solr merely had an option to return a start/length array (of what should be highlighted) in the original string rather than returning an altered string with tags inserted. Mark On 9/29/2015 7:04 AM, Upayavira wrote: You can change the strings that are inserted into the text, and could place markers that you use to identify the start/end of highlighting elements. Does that work? Upayavira On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote: Greetings! I have highlighting turned on in my Solr searches, but what I get back is tags surrounding the found term. Since I use a SWT StyledText widget to display my search results, what I really want is the offset and length of each found term, so that I can highlight it in my own way without HTML. Is there a way to configure Solr to do that? I couldn't find it. If not, how do I go about posting this as a feature request? Thanks, Mark
Re: solr.SynonymFilterFactory
Hi Vincenzo, By intuition, regardless of what value you set for attributes such as expand or ignoreCase, I think synonym records that LHS==RHS are meaningless. That is, you can remove these lines. Koji On 2015/09/17 16:51, Vincenzo D'Amore wrote: Hello, this may be a silly question. I have found a synonyms file with a lot of cases where LHS is equal to RHS. airmax=>airmax airplane=>airplane airwell=>airwell akai=>akai akasa=>akasa akea=>akea akg=>akg Given that the solr.SynonymFilterFactory is configured with expand="false" ignoreCase="true" May I remove all these lines? Bests, Vincenzo
Re: How to export the list of terms indexed in Solr?
Hi brent3600, You can use NLP4L for this purpose. NLP4L is good at counting the number of words not only in whole index but also in a set of documents. There is a tutorial for this function. Count the number of words http://nlp4l.github.io/tutorial_ja.html#useNLP Sorry but the tutorial is written in Japanese now. We'll provide English tutorial soon. Until then please use translation service to read it in English. :) Koji On 2015/04/30 7:34, brent3600 wrote: We are indexing collections of documents (files) with SOLR, and would like the following capability: Export or pull from SOLR the list of terms that have been indexed for a document or set of documents, along with the term frequency count. 1. Does SOLR already provide an API or method to accomplish this? 2. If not, is there an add-on module that provides this functionality? 3. If not, is it technically feasible at a low level of effort to add this functionality? - brent3600 -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-export-the-list-of-terms-indexed-in-Solr-tp4203124.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting and Rerank
Hi, You're right. Those sets are same each other, only documents order is different. Koji On 2015/03/26 0:53, innoculou wrote: If I do an initial search without any field sorting; and then do the exact same query but also sort one field will I get the same result set in the subsequent query but sorted. In other words, does simply applying a sort criteria affect the re-rank on the full search or does it just sort the result from the main query? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-and-Rerank-tp4195187.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene cosine similarity score for more like this query
Lucene uses TFIDFSimilarity class to calculate the similarity. It is implemented on the idea of cosine measurement but it modifies the cosine formula. Please take a look at Lucene Practical Scoring Function in the following Javadoc: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html On 2015/02/03 5:39, Ali Nazemian wrote: Dear Erik, Thank you for your response. Would younplease tell me why this score could be higher than 1? While cosine similarity can not be higher than 1. On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote: The scoring is the same as Lucene. To get deeper insight into how a score is computed, use Solr’s debug=true mode to see the explain details in the response. Erik On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com wrote: Hi, I was wondering what is the range of score is brought by more like this query in Solr? I know that the Lucene uses cosine similarity in vector space model for calculating similarity between two documents. I also know that cosine similarity is between -1 and 1 but the fact that I dont understand is why the score which is brought by more like this query could be 12 for example?! Would you please explain what is the calculation process is Solr? Thank you very much. Best regards. -- A.Nazemian
[ANN] word2vec for Lucene
Hello, It's my pleasure to share that I have an interesting tool word2vec for Lucene available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use word2vec for Lucene to extract word vectors from Lucene index. Thank you, Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Re: [ANN] word2vec for Lucene
Hi Paul, I cannot compare it to SemanticVectors as I don't know SemanticVectors. But word vectors that are produced by word2vec have interesting properties. Here is the description of the original word2vec web site: https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors Interesting properties of the word vectors It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') Thanks, Koji (2014/11/20 20:01), Paul Libbrecht wrote: Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, It's my pleasure to share that I have an interesting tool word2vec for Lucene available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use word2vec for Lucene to extract word vectors from Lucene index. Thank you, Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Re: [ANN] word2vec for Lucene
Thanks Glen for the URL. I'd like to check it when I am available. Thanks Paul for giving me the difference between them. I like your description! Koji (2014/11/21 2:18), Paul Libbrecht wrote: As far as I could tell, word2vec seems more mathematical, which is rather nice. At least I see more transparent math in the web-page. Maybe this helps a bit? SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. Maybe it's more a question of presentation. Paul On 20 nov. 2014, at 16:24, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Paul, I cannot compare it to SemanticVectors as I don't know SemanticVectors. But word vectors that are produced by word2vec have interesting properties. Here is the description of the original word2vec web site: https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors Interesting properties of the word vectors It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') Thanks, Koji (2014/11/20 20:01), Paul Libbrecht wrote: Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, It's my pleasure to share that I have an interesting tool word2vec for Lucene available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use word2vec for Lucene to extract word vectors from Lucene index. Thank you, Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Re: [ANN] word2vec for Lucene
Hi Joseph, Thank you for asking. If you want to do it in the interactive sense, it won't work well practically because it takes several minutes for learning. If you accept working in batch sense, the feature can be implemented, but I've not done it yet. I have the open ticket for that: accept filter query https://github.com/kojisekig/word2vec-lucene/issues/2 Thanks, Koji (2014/11/21 8:22), Joseph Obernberger wrote: Hi Koji - is it possible to execute word2vec on a subset of documents from Solr? - ie could I run a query, get back the top n results and pass only those to word2vec? Will this work with Solr Cloud? Thank you! -Joe On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht p...@hoplahup.net wrote: As far as I could tell, word2vec seems more mathematical, which is rather nice. At least I see more transparent math in the web-page. Maybe this helps a bit? SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. Maybe it's more a question of presentation. Paul On 20 nov. 2014, at 16:24, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Paul, I cannot compare it to SemanticVectors as I don't know SemanticVectors. But word vectors that are produced by word2vec have interesting properties. Here is the description of the original word2vec web site: https://code.google.com/p/word2vec/#Interesting_properties_of_the_word_vectors Interesting properties of the word vectors It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') Thanks, Koji (2014/11/20 20:01), Paul Libbrecht wrote: Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, It's my pleasure to share that I have an interesting tool word2vec for Lucene available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use word2vec for Lucene to extract word vectors from Lucene index. Thank you, Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Re: boosting words from specific list
Hi Ali, I don't think Solr has such function OOTB. One way I can think of is that you can implement UpdateRequestProcessor. In processAdd() method of the UpdateRequestProcessor, as you can read field values, you can calculate the total score and copy the total score to a field e.g. total_score. Then you can sort the query result on total_score field when you query. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/09/29 4:25), Ali Nazemian wrote: Dear all, Hi, I was wondering how can I implement solr boosting words from specific list of important words? I mean I want to have a list of important words and tell solr to score documents based on the weighted sum of these words. For example let word school has weight of 2 and word president has the weight of 5. In this case a doc with 2 school words and 3 president words will has the total score of 19! I want to sort documents based on this score. How such procedure is possible in solr? Thank you very much. Best regards.
Re: statuscode list
Hi Jan, (2014/09/05 21:01), Jan Verweij - Reeleez wrote: Hi, If I'm correct you will get a statuscode=0 in the response if you use XML messages for updating the solr index. I think you mean by statuscode=0 is status=0 here. ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime7/int/lst /response Is there a list of possible other statuscodes you can receive in case anything fails and what these errorcodes mean? I don't think we have a list of possible other status because Solr doen't return status other than 0. Instead of status code in XML, you should look at HTTP status code e.g. 200 OK, 404 Not Found, etc. because if there is something wrong on Solr while updating (even querying) index, Solr may not return XML anyway. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Re: ExternalFileFieldReloader and commit
Hi Peter, It seems like a bug to me, too. Please file a JIRA ticket if you can so that someone can take it. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/08/05 22:34), Peter Keegan wrote: When there are multiple 'external file field' files available, Solr will reload the last one (lexicographically) with a commit, but only if changes were made to the index. Otherwise, it skips the reload and logs: No uncommitted changes. Skipping IW.commit. Has anyone else noticed this? It seems like a bug to me. (yes, I do have firstSearcher and newSearcher event listeners in solrconfig.xml) Peter
Re: Understanding the Debug explanations for Query Result Scoring/Ranking
Hi, In addition, this might be useful: Fundamentals of Information Retrieval, Illustration with Apache Lucene https://www.youtube.com/watch?v=SCsS5ePGmCs This video is about 40 minutes long, but you can fast forward to 24:00 to learn scoring based on vector space model and how Lucene customize it. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/07/25 8:00), Uwe Reh wrote: Hi, to get an idea of the meaning of all this numbers, have a look on http://explain.solr.pl. I like this tool, it's great. Uwe Am 25.07.2014 00:45, schrieb O. Olson: Hi, If you add /*debug=true*/ to the Solr request /(and wt=xml if your current output is not XML)/, you would get a node in the resulting XML that is named debug. There is a child node to this called explain to this which has a list showing why the results are ranked in a particular order. I'm curious if there is some documentation on understanding these numbers/results. I am new to Solr, so I apologize that I may be using the wrong terms to describe my problem. I also aware of http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html though I have not completely understood it. My problem is trying to understand something like this: 1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in 44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0 = termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of: 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226 = fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 = fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109) [DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 = termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 = fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of: 6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 = fieldNorm(doc=44109) *Note:* I have searched for televisions. My search field is a single catch-all field. The Edismax parser seems to break up my search term into televis and tv Is there some documentation on how to understand these numbers. They do not seem to be properly delimited. At the minimum, I can understand something like: 1.5797625 = 0.4717142 + 1.1080483 and 0.71447384 = 7.0424104 * 0.10145303 But, I cannot understand if something like 0.10145303 = queryNorm 0.660226 = fieldWeight in 44109 is used in the calculation anywhere. Also since there were only two terms /(televis and tv)/ I could use subtraction to find out 1.1080483 was the start of a new result. I'd also appreciate if someone can tell me which class dumps out the above data. If I know it, I can edit that class to make the output a bit more understandable for me. Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Contiguous Phrase Highlighting Example
Hi Teague, If you want phrase-unit tagging for highlighter, you need to use FastVectorHighlighter instead of the ordinary Highlighter. To turn on FVH, set hl.useFastVectorHighlighter=on when querying. In addition, when indexing, you need to set termVectors=on, termPositions=on and termOffsets=on on content field in your schema.xml. http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/07/18 3:19), Teague James wrote: Hi everyone! Does anyone have any good examples of generating a contiguous highlight for a phrase? Here's what I have done: curl http://localhost/solr/collection1/update?commit=true -H Content-Type: text/xml --data-binary 'adddocfield name=id100/fieldfield name=contentblah blah blah knowledge of science blah blah blah/field/doc/add' Then, using a browser: http://localhost/solr/collection1/select?q=knowledge+of+sciencefq=id:100 What I get back in highlighting is: strblah blah blah bknowledge/b bof/b bscience/b blah blah blah/str What I want to get back is: strblah blah blah bknowledge of science/b blah blah blah/str I have the following highlighting configurations in my requestHandler in addition to hl, hl.fl, etc.: str name=hl.mergeContiguousfalse/str str name=usePhraseHighlightertrue/str str name-highlightMultiTermtrue/str None of the last two seemed to have any impact on the output. I've tried every permutation of those three, but the output is the same. Any suggestions or examples of getting highlights to come back this way? I'd appreciate any advice on this! Thanks! -Teague
Re: OCR - Saving multi-term position
Hi Manuel, I think OCR error correction is one of well-known NLP tasks. I'd thought it could be implemented in the past by using Lucene. This is a brief idea: 1. You have got a Lucene index. This existing index is made from correct (i.e. error free) documents that are same domain of OCR documents. 2. Tokenize OCR text by ShingleTokenizer. By ShingleTokenizer, you'll get: the quiok tlne quick the quick : 3. Search those phrase in the existing index. I think exact search (PhraseQuery) or FuzzyQuery can be worked. You should get the highest hit count when searching the quick among those phrases. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/07/02 7:19), Manuel Le Normand wrote: Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
Re: Restriction on type of uniqueKey field?
In addition, KeywordTokenizer can be seemingly used but it should be avoided for unique key field. One of my customers that used it and they had got OOM during a long term indexing. As it was difficult to find the problem, I'd like to share my experience. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/07/01 6:48), Alexandre Rafalovitch wrote: I wasn't thinking of shard keys, but may have been confused in the reading. Thank you everyone, the long key is working just fine for me. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 1, 2014 at 8:15 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Alex, maybe you're thinking of constraints put on shard keys? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Jul 1, 2014 at 7:05 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: No, you definitely can have an int or long uniqueKey. A lot of Solr's tests use such a uniqueKey. See solr/core/src/test-files/solr/collection1/conf/schema.xml On Tue, Jul 1, 2014 at 3:20 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I remember reading somewhere that id field (uniqueKey) must be String. But I cannot find the definitive confirmation, just that it should be non-analyzed. Can I use a single-valued TrieLongField type, with precision set to 0? Or am I going to hit issues? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency -- Regards, Shalin Shekhar Mangar.
Re: Multiple highlight snippet for single field
Hi Bijan, Have you tried to set hl.maxAnalyzedChars parameter to larger number? hl.maxAnalyzedChars http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars As the default value of the parameter is 51200, if the second Andy is at the end paragraph of your large stored field, the highloghter doesn't deals with the second Andy. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/05/16 13:25), Bijan Pourriahi wrote: Hello all, I am trying to return multiple snippets from a single document with a field which includes many (5+) instances of the word ‘andy’ in the text. For some reason, I can only get it to return one snippet. Any ideas? Here’s the query and the response: http://codejaw.com/2gwoozr Thanks! - Bijan This e-mail transmission and any documents, files or previous e-mail messages attached to it, are confidential. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any review, disclosure, copying, dissemination, distribution or use of any of the information contained in, or attached to this e-mail transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender then delete immediately.
Re: AND not as a boolean operator in Phrase
(2014/03/26 2:29), abhishek jain wrote: hi friends, when i search for A and B it gives me result for A , B , i am not sure why? Please guide how can i exact match when it is within phrase/quotes. Generally speaking (w/ LuceneQParser), if you want phrase match results, use quotes, i.e. q=A B. If you want results which contain both terms A and B, do not use quotes but boolean operator AND, i.e. q=A AND B. koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html
Re: Solr Nutch
1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. In addition, I think Nutch has PageRank-like scoring function as opposed to Lucene/Solr, those are based on vector space model scoring. koji -- http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html
Re: document contained more than 100000 characters
Hi, I'm not sure but you probably met Tika exception. Have you checked Apache Tika mailing list? Hmm, just now I googled Your document contained more than 10 characters, I found a page in StackOverFlow. According to it, there is API to change the limit. But I don't know whether Solr can change the limit. If there is no chance to change the limit in Solr, you can open a JIRA ticket. koji -- http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html (13/12/23 2:17), Nutan wrote: Why is the error as : org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 10 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available). at org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:140) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) when i added this in solrconfig.xml requestDispatcher handleSelect=false requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=200048 / /requestDispatcher -- View this message in context: http://lucene.472066.n3.nabble.com/document-contained-more-than-10-characters-tp4107792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing from bowser
Hi, (13/12/16 19:46), Nutan wrote: how to index pdf,doc files from browser? I think you can index from browser. If you said that this query is used for indexing : curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@C:\solr\document\src\test1\Coding.pdf curl works for you but When i try to index using this: http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@C:\solr\document\src\test1\Coding.pdf the document does not get indexed. browser doesn't work for you, why don't you look into Solr log and compare the logs between when you using curl and browser? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Passing a Parameter to a Custom Processor
Hi Dileepa, The stanbolInterceptor processor chain will be used in multiple request handlers. Then I will have to pass the stanbol.enhancer.url param in each of those request handler which will cause redundant configurations. Therefore I need to pass the param to the processor directly. But when I pass the params to the Processor as below the parameter is not received to my ProcessorFactory class; processor class=com.solr.stanbol.processor.StanbolContentProcessorFactor * str name=stanbol.enhancer.urlhttp://localhost:8080/enhancer http://localhost:8080/enhancer/str* /processor Can someone point out what might be wrong here? Can someone please advice on how to pass parameters directly to the Processor? I don't know why your Processor cannot get the parameters, but Processor should get them. For example, StatelessScriptUpdateProcessorFactory can get script parameter like this: processor class=solr.StatelessScriptUpdateProcessorFactory str name=scriptupdateProcessor.js/str /processor http://lucene.apache.org/solr/4_5_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html So why don't you consult the source code of StatelessScriptUpdateProcessorFactory, etc? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: SOLRJ API to do similar CURL command execution
(13/11/13 22:25), Anupam Bhattacharya wrote: How can I post the whole XML string to SOLR using its SOLRJ API ? The source code of SimplePostTool would be of some help: http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/SimplePostTool.html koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: count links pointing to id
(13/11/10 3:43), Andreas Owen wrote: I have a multivalue field with links pointing to ids of solrdocuments. I would like calculate how many links are pointing to each document und put that number into the field links2me. How can I do this, I would prefer to do it with a query and the updater so solr can do it internaly if possible? I don't think Solr can do it internally. You should sum up the link counts per id and put the sum to links2me field before indexing. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: solr sort facets by name
(13/11/06 9:00), PeterKerk wrote: By default solr sorts facets by the amount of hits for each result. However, I want to sort by facetnames alphabetically. Earlier I sorted the facets on the client or via my .NET code, however, this time I need solr to return the results with alphabetically sorted facets directly. How? Isn't it facet.sort=index ? http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Unable to add mahout classifier
Caused by: java.lang.ClassCastException: class com.mahout.solr.classifier.CategorizeDocumentFactory at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:381) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:526) ... 21 more There seems to be a problem related class loaders, e.g. CategorizeDocumentFactory which extends UpdateRequestProcessorFactory, loaded by class loader B, but Solr core has loaded UpdateRequestProcessorFactory via class loader A or something like that... koji -- http://www.rondhuit.com/
Re: Unable to add mahout classifier
(13/10/30 22:09), lovely kasi wrote: Hi, I made few changes to the solrconfig.xml, created a jar file,added it to the lib folder of the solr and tried to start it. THe changes in the solrconfig.xml are updateRequestProcessorChain name=mahoutclassifier default=true processor class=com.mahout.solr.classifier.CategorizeDocumentFac str name=inputFieldLEAD_NOTES/str str name=outputFieldcategory/str str name=defaultCategoryOthers/str str name=modelnaiveBayesModel/str /processor processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain What is com.mahout.solr.classifier.CategorizeDocumentFac ? Is it a classifier delivered by Solr community? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Return the synonyms as part of Solr response
Hi Siva, (13/10/30 18:12), sivaprasad wrote: Hi, We have a requirement where we need to send the matched synonyms as part of Solr response. I don't think that Solr has such function. Do we need to customize the Solr response handler to do this? So the answer is yes. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Help on solr more like this functionality
Hi Suren, (13/10/25 23:36), Suren Raju wrote: Hi, We are trying to solve a business problem by performing solr more like this query. We are able to perform the more like this search. We have a specific use case that requires different boost on different match fields. Say i do more like this based on fields title and description of products. I wanna provide more boost for match field *title *than the description. Query im trying so far is mysolrhost:8983/solr/mlt?q=id:UTF8TESTmlt.fl=title,descriptionmlt.mindf=1mlt.mintf=1 Is there any way to provide different boost for title and description? I don't have much experience on MLT, but index time boosting might help you? Koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: how to debug my own analyzer in solr
Hi Mingz, If you use Eclipse, you can debug Solr with your plugin like this: # go to Solr install directory $ cd $SOLR $ ant run-example -Dexample.debug=true Then connect the JVM from Eclipse via remote debug port 5005. Good luck! koji (13/10/21 18:58), Mingzhu Gao wrote: More information about this , the custom analyzer just implement createComponents of Analyzer. And my configure in schema.xml is just something like : fieldType name=text_cn class=solr.TextField analyzer class=my.package.CustomAnalyzer / /fieldType From the log I cannot see any error information , however , when I want to analysis or add document data , it always hang there . Any way to debug or narrow down the problem ? Thanks in advance . -Mingz On 10/21/13 4:35 PM, Mingzhu Gao m...@adobe.com wrote: Dear solr expert , I would like to write my own analyser ( Chinese analyser ) and integrate them into solr as solr plugin . From the log information , the custom analyzer can be loaded into solr successfully . I define my fieldType with this custom analyzer. Now the problem is that , when I try this analyzer from http://localhost:8983/solr/#/collection1/analysis , click the analysis , then choose my FieldType , then input some text . After I click Analyse Value button , the solr hang there , I cannot get any result or response in a few minutes. I also try to add some data by curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml , or by post.sh in exampledocs folder , The same issue , the solr hang there , no result and not response . Can anybody give me some suggestions on how to debug solr to work with my own custom analyzer ? By the way , I write a java program to call my custom analyzer , the result is okay , for example , the following code can work well . == Analyzer analyzer = new MyAnalyzer() ; TokenStream ts = analyzer.tokenStream() ; CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class); ts.reset(); while (ts.incrementToken()){ System.out.println(ta.toString()); } = Thanks, -Mingz -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: ExtractRequestHandler, skipping errors
Hi, I think the flag cannot ignore NoSuchMethodError. There may be something wrong here? ... I've just checked my Solr 4.5 directories and I found Tika version is 1.4. Tika 1.4 seems to use commons compress 1.5: http://svn.apache.org/viewvc/tika/tags/1.4/tika-parsers/pom.xml?view=markup But I see commons-compress-1.4.1.jar in solr/contrib/extraction/lib/ directory. Can you open a JIRA issue? For now, you can get commons compress 1.5 and put it to the directory (don't forget to remove 1.4.1 jar file). koji (13/10/18 16:37), Roland Everaert wrote: Hi, We already configure the extractrequesthandler to ignore tika exceptions, but it is solr that complains. The customer manage to reproduce the problem. Following is the error from the solr.log. The file type cause this exception was WMZ. It seems that something is missing in a solr class. We use SOLR 4.4. ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V at org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:102) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) ... 16 more On Thu, Oct 17, 2013 at 5:19 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Roland, (13/10/17 20:44), Roland Everaert wrote: Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the ExtractRequestHandler to ignore errors like it seems to be possible with dataimporthandler and entity processors. I know that it is possible to configure the ExtractRequestHandler to ignore tika exception (We already do that) but the errors that now stops the mcfjobs are generated by solr itself. While it is interesting to have such option in solr, I plan to post to the manifoldcf mailing list, anyway, to know if it is possible to configure manifolcf to be less picky about solr errors. ignoreTikaException flag might help you? https://issues.apache.org/**jira/browse/SOLR-2480https://issues.apache.org/jira/browse/SOLR-2480 koji -- http://soleami.com/blog/**automatically-acquiring-** synonym-knowledge-from-**wikipedia.htmlhttp://soleami.com/blog/automatically-acquiring-synonym-knowledge-from
Re: ExtractRequestHandler, skipping errors
Hi Roland, (13/10/17 20:44), Roland Everaert wrote: Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the ExtractRequestHandler to ignore errors like it seems to be possible with dataimporthandler and entity processors. I know that it is possible to configure the ExtractRequestHandler to ignore tika exception (We already do that) but the errors that now stops the mcfjobs are generated by solr itself. While it is interesting to have such option in solr, I plan to post to the manifoldcf mailing list, anyway, to know if it is possible to configure manifolcf to be less picky about solr errors. ignoreTikaException flag might help you? https://issues.apache.org/jira/browse/SOLR-2480 koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: req info : SOLRJ and TermVector
(13/10/16 17:47), elfu wrote: hi, can i access TermVector information using solrj ? There is TermVectorComponent to get termVector info: http://wiki.apache.org/solr/TermVectorComponent So yes, you can access it using solrj. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: fq caching question
Hi Tim, (13/10/15 5:22), Tim Vaillancourt wrote: Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a combined filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1) /select?q=*:*fq=type:bidfq=user_id:3 2) /select?q=*:*fq=(type:bid%20AND%20user_id:3) For query #1: am I correct that the first query will keep 2 independent entries in the filterCache for type:bid and user_id:3?\ Correct. For query #2: is it correct that the 2nd query will keep 1 entry in the filterCache that satisfies all conditions? Correct. Lastly, is it a fair statement that under general query patterns, many separate filter queries are more-cacheable than 1 combined one? Eg, if I performed query #2 (in the filterCache) and then changed the user_id, nothing about my new query is cache able, correct (but if I used 2 separate filter queries than 1 of 2 is still cached)? Yes, it is. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Please help!, Highlighting exact phrases with solr
(13/10/10 18:17), Silvia Suárez wrote: I am using solrj as client for indexing documents on the solr server I am new to solr, And I am having problem with the highlighting in solr. Highlighting exact phrases with solr does not work. For example if the search keyword is: dulce hogar it returns: span class=item dulce /span span class=item hogar /span And it should be: span class=item dulce hogar /span I don't understand which is the problem. Can someone helpme please!? Unfortunately, it is the feature. FVH can support phrase-unit highlighting. http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: defType
See line 33 to 50 at http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/QParserPlugin.java?view=markup koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/08/11 8:05), William Bell wrote: Can you list them out? Thanks. raw lucene dismax edismax field On Sat, Aug 10, 2013 at 4:45 PM, Jack Krupansky j...@basetechnology.comwrote: The full list is in my book. What did you need in particular? (Actually, I forgot to add maxscore to my list.) -- Jack Krupansky -Original Message- From: William Bell Sent: Saturday, August 10, 2013 6:30 PM To: solr-user@lucene.apache.org Subject: defType What are the possible options for defType? lucene dismax edismax Others? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Proximity and highliting
(13/08/04 14:36), Alex Cougarman wrote: Hi all. I'm having some issues with highlighting and proximity searching in Solr 4.x. Matching words in the query are sometimes highlighted even if they are not within proximity and in some cases, matching words in the query are not highlighted at all. Does anyone know why this would be happening? Thanks. -Alex Do you set hl.usePhraseHighlighter parameter to true? http://wiki.apache.org/solr/HighlightingParameters#hl.usePhraseHighlighter koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: ICUTransformFilterFactory
(13/08/02 17:53), Jochen Lienhard wrote: Hello, we have a problem with some special characters: for example æ We are using the ICUTranformFilterFactory for indexing and searching. We have some documents with urianae and with urianæ If I search urainae so I find only the versions with urianae but not the urianæ Only if I search urainae* I find both versions. Is it possible (perhaps by special IDs in the ICUTransformFilterFactory), so that I can find all without an asterisk? Why don't you use MappingCharFilter? https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG (attached at https://issues.apache.org/jira/browse/SOLR-822 ) koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Sort by document similarity counts
I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*sort=similarityCount). I don't understand this part very well, but: But this will not work because sort is done before handling my custom search component, if added via last-components. Can't add it via first-components, because then I will have no access to query results. And I do not want to override QueryComponent because I need to have all the functionality it covers: grouping, facets, etc. You may want to put your custom SearchComponent to last-component and inject SortSpec in your prepare() so that QueryComponent can sort the result complying with your SortSpec? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Find related words
You may want collocations a given word? I've implemented LUCENE-474 for Solr a while ago and I found it worked pretty well. https://issues.apache.org/jira/browse/LUCENE-474 Hope this helps. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/07/04 21:09), Dotan Cohen wrote: How might one find the top related words for a given word in a Solr index? For instance, given the following single-field documents: 1: I love chocolate 2: I love Solr 3: I eat chocolate cake 4: You will eat chocolate candy Thus, given the word Chocolate Solr might find these top words: I (3 times matched) eat (2 times matched) love, cake, you, will, candy (1 time each) Thanks! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Find related words
Hi Dotan, (13/07/04 23:51), Dotan Cohen wrote: Thank you Jack and Koji. I will take a look at MLT and also at the .zip files from LUCENE-474. Koji, did you have to modify the code for the latest Solr? Yes. As the Lucene APIs for accessing index have been changed, I had to modify the code. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia
Hi Rajesh, Thanks! I'm planning to open an NLP tool kit for Lucene, and the tool kit will include the following synonym library. koji (13/05/28 14:12), Rajesh Nikam wrote: Hello Koji, This is seems pretty useful post on how to create synonyms file. Thanks a lot for sharing this ! Have you shared source code / jar for the same so at it could be used ? Thanks, Rajesh On Mon, May 27, 2013 at 8:44 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from Wikipedia: http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html Hope that the article gives someone a good experience! koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Note on The Book
Hi Jack, I'd like to ask as a person who contributed a case study article about Automatically acquiring synonym knowledge from Wikipedia to the book. (13/05/24 8:14), Jack Krupansky wrote: To those of you who may have heard about the Lucene/Solr book that I and two others are writing on Lucene and Solr, some bad and good news. The bad news: The book contract with O’Reilly has been canceled. The good news: I’m going to proceed with self-publishing (possibly on Lulu or even Amazon) a somewhat reduced scope Solr-only Reference Guide (with hints of Lucene). The scope of the previous effort was too great, even for O’Reilly – a book larger than 800 pages (or even 600) that was heavy on reference and lighter on “guide” just wasn’t fitting in with their traditional “guide” model. In truth, Solr is just too complex for a simple guide that covers it all, let alone Lucene as well. Will the reduced Solr-only reference guide include my article? If not (for now I think it is not because my article is for Lucene case study, not Solr), I'd like to put it out on my blog or somewhere. BTW, those who want to know how to acquire synonym knowledge from Wikipedia, the summary is available at slideshare: http://www.slideshare.net/KojiSekiguchi/wikipediasolr koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
[blog post] Automatically Acquiring Synonym Knowledge from Wikipedia
Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from Wikipedia: http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html Hope that the article gives someone a good experience! koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Note on The Book
Now my contribution can be read on soleami blog in English: Automatically Acquiring Synonym Knowledge from Wikipedia http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html koji (13/05/27 21:16), Jack Krupansky wrote: If you would like to Solr-ize your contribution, that would be great. The focus of the book will be hard-core Solr. -- Jack Krupansky -Original Message- From: Koji Sekiguchi Sent: Monday, May 27, 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Re: Note on The Book Hi Jack, I'd like to ask as a person who contributed a case study article about Automatically acquiring synonym knowledge from Wikipedia to the book. (13/05/24 8:14), Jack Krupansky wrote: To those of you who may have heard about the Lucene/Solr book that I and two others are writing on Lucene and Solr, some bad and good news. The bad news: The book contract with O’Reilly has been canceled. The good news: I’m going to proceed with self-publishing (possibly on Lulu or even Amazon) a somewhat reduced scope Solr-only Reference Guide (with hints of Lucene). The scope of the previous effort was too great, even for O’Reilly – a book larger than 800 pages (or even 600) that was heavy on reference and lighter on “guide” just wasn’t fitting in with their traditional “guide” model. In truth, Solr is just too complex for a simple guide that covers it all, let alone Lucene as well. Will the reduced Solr-only reference guide include my article? If not (for now I think it is not because my article is for Lucene case study, not Solr), I'd like to put it out on my blog or somewhere. BTW, those who want to know how to acquire synonym knowledge from Wikipedia, the summary is available at slideshare: http://www.slideshare.net/KojiSekiguchi/wikipediasolr koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: cache disable through solrJ
(13/05/20 20:53), J Mohamed Zahoor wrote: Hi How do i disable cache (Solr FieldValueCache) for certain queries... using HTTP it can be done using {!cache=false}... how can i do it from solrj? ./zahoor How about using facet.method=enum? koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Solr 3.6.1: changing a field from stored to not stored
(13/04/24 7:09), Petersen, Robert wrote: Hi guys, What would happen if I changed a field definition on an existing field in an existing index from stored to not stored? Would solr just party on ignoring the fact that this field's data is stored in the current index? I noticed I am unnecessarily storing some fields in my index and I'd like to stop storing them without having to 'reindex the world' and let the changes just naturally percolate into my index as updates come in the normal course of things. Do you guys think I could get away with this? Thanks, Robert (Robi) Petersen Senior Software Engineer Search Engineer I think Solr will just ignore the existing stored data. But I've never to do it myself. Please try it. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Returning similarity values for more like this search
(13/04/19 23:24), Achim Domma wrote: Hi, I'm executing a search including a search for similar documents (mlt=truemlt.fl=) which works fine so far. I would like to get the similarity value for each document. I expected this to be quite common and simple, but I could not find a hint how to do it. Any hint how to do it would be very appreciated. kind regards, Achim Using debugQuery=true, you can find explanations in the debug section of the response. See: https://issues.apache.org/jira/browse/SOLR-860 koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: conditional queries?
Hi Mark, Is it possible to do a conditional query if another query has no results? For example, say I want to search against a given field for: - Search for car. If there are results, return them. - Else, search for car* . If there are results, return them. - Else, search for car~ . If there are results, return them. Is this possible in one query? Or would I need to make 3 separate queries by implementing this logic within my client? As far as I know, there is no such SearchComponent. But the idea of FallbackRequestHandler has been told, see SOLR-1878, for example: https://issues.apache.org/jira/browse/SOLR-1878 koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Flow Chart of Solr
(13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is an interesting ticket: Architecture Diagrams needed for Lucene, Solr and Nutch https://issues.apache.org/jira/browse/LUCENE-2412 koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Confusion over Solr highlight hl.q parameter
(13/04/03 5:27), Van Tassell, Kristian wrote: Thanks Koji, this helped with some of our problems, but it is still not perfect. This query, for example, returns no highlighting: ?q=id:abc123hl.q=text_it_IT:l'assiemehl.fl=text_it_IThl=truedefType=edismax But this one does (when it is, in effect, the same query): ?q=text_it_IT:l'assiemehl=truedefType=edismaxhl.fl=text_it_IT I've tried many combinations but can't seem to get the right one to work. Is this possibly a bug? As hl.q doesn't care defType parameter but does localParams, can you try to put {!edismax} to hl.q parameter? koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Getting back highlights almost always works...
(13/03/20 6:14), Van Tassell, Kristian wrote: ...but I'm finding some examples where the stored text is so big (14,000 words) that Solr fails to highlight anything. But the data is definitely in the text field and is returning due to that hit. Does anyone have any ideas why this happens? Probably you are missing hl.maxAnalyzedChars parameter? http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Retrieving Term vectors
Hi Sarita, I've not dug into your code detail but my first impression is that you are missing store term positions? FieldType fieldType = new FieldType(); IndexOptions indexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS; fieldType.setIndexOptions(indexOptions); fieldType.setIndexed(true); fieldType.setStoreTermVectors(true); fieldType.setStored(true); Document doc = new Document(); doc.add(new Field(content, one quick brown fox jumped over one lazy dog, fieldType)); I think you need: fieldType.setStoreTermVectorPositions(true); if you want term vector positions later. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Incorrect snippets using FastVectorHighlighter
Hi Jochen, There is a restriction in FVH. FVH cannot deal with variable gram size. That is, minGramSize == maxGramSize in your NGramFilterFactory setting. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html (13/03/18 22:17), Jochen Just wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi list, i have the following field type in my schema.xml defined in order to be able to do in word search. fieldType name=string_parts_back class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=1000/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Searching itself works as expected, though highlighting causes me headaches. At first I did not use the FastVectorHighlighter, which meant highlighting did not work at all for fields of this type. Since I'm using the FastVectorHighlighter most of the time highlighting works, sometimes it doesn't. Given I have a document containing the word 'Superkalifragilistischexpialligetisch' and I search for 'uperkalifragilistische', I would expect as result 'Semuperkalifragilistische/emxpiallegetisch' but it is 'Semuperkalifragilist/emischexpialligetisch'. So there is 'ische' missing in the highlighted part. Sadly, I am not able to create a simple setup to reproduce this, but it only happens in our in-house live system. Though if I remove some fields from my qf attribute of the edismax parser in solconfig.xml, it stops behaving like that. Some of those removed fields have the fieldType string_parts_back. Does any one have a clue, what's going on? Thanks in advance, Jochen - -- Jochen Just Fon: (++49) 711/28 07 57-193 avono AG Mobil: (++49) 172/73 85 387 Breite Straße 2 Mail: jochen.j...@avono.de 70173 Stuttgart WWW: http://www.avono.de -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBAgAGBQJRRxP5AAoJEP1xbhgWUHmSRAsP/AlLHWA6Pw6Jk5Pmr0rqiAxE IsJ6HeL+4e56IHsKsruBY7HOGdEwRvXHSkwlKGLF+dvyzz4/lx7wbGBHJCMJJkDe Yas9izso5z4KGKzKazMYPPKoXja67zmWmRU5PYG/exT8N1gjnA98KTzXAA47xIxA rm9zUBImPF1eIZmEBcytI/+EMJI4Cy30OvRyWfc6XoxF7Kq5wJuMXvTWl24gM0tQ xdPUVZ6ir8IkrGw2P7d3/IgaAtYbT+SEAuFjSE9rtS8KdJfWbXDYYupqNV59Syqh 7F5ywEOgnt/OBTODFp9FR4ElakOlSZrmRk8CgYfUZZu9vNASxyBnCWwhz+CkCbfQ fYRzy1HyDUGIGFl6FAi+4WE4av5EdWUH6N0UEdUkE6tI5b/IqzGIdocSl36PqeMR za7jKfU9LWqc+Xoh27wLP8Wi11t/XIRQuRCxKSFpc2Go3iweCTu+cXr1K6XTndj/ uoptQ1nJJcQTRmdvxlxA5jvrVaGvOclEEFsndQWyq6wK7CJ9k+FOHfYwc7p3L1Bp QoTTErdEKgCZj+w39Ma0ASURBX1+jjLqRnMvleSD4CX2K78z8Z7c5a7m48192D6u mg6uOIUyTdTPH5SLUOU+rNDjOuLLbJOuVGXdpSqYymkr2WPlwwBj+ZYGx1lap1xE 5ZgU5nHnodtUAC9jjz52 =KsNm -END PGP SIGNATURE-
Re: Incorrect snippets using FastVectorHighlighter
So just to be clear: There is no possibility to highlight results, if I use variable gram size. Neither the original highlighter nor FVH do the job. Or am I missing something? I don't know the latest original highlighter has such restriction or not today, but when FVH came in 2.9, at that time, the original highlighter couldn't deal with n-gram field if n 1, because (k)-th term's end offset can be larger than (k+1)-th term's start offset. Btw does any documentation exits how the VFH works? See package summary: http://lucene.apache.org/core/4_2_0/highlighter/org/apache/lucene/search/vectorhighlight/package-summary.html koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Confusion over Solr highlight hl.q parameter
(13/03/16 4:08), Van Tassell, Kristian wrote: Hello everyone, If I search for a term “baz” and tell it to highlight it, it highlights just fine. If, however, I search for “foo bar” using the q parameter, which appears in that same document/same field, and use the hl.q parameter to search and highlight “baz”, I get no highlighting results for “baz”. ?q=パーツにおける機能強化 qf=text_ja_JP defType=edismax hl=true hl.simple.pre=em hl.simple.post=/em hl.fl=text_ja_JP The above highlights query term just fine. ?q=1234 hl.q=パーツにおける機能強化 qf=id defType=edismax hl=true hl.simple.pre=em hl.simple.post=/em hl.fl=text_ja_JP This one returns zero highlighting hits. I'm just guessing, Solr highlighter tries to highlight パーツにおける機能強化 in your default search field? Can you try hl.q=text_ja_JP:パーツにおける機能強化 . koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: how to overrride pre and post tags when usefastVectorHighlighter is set to true
Hi Alex, (13/02/23 10:53), alx...@aim.com wrote: Hello, I was unable to change pre and post tags for highlighting when usefastVectorHighlighter is set to true. Changing default tags in solrconfig.xml works for standard highlighter though. I searched mailing list and the net with no success. I use solr-4.1.0. According to Wiki: hl.simple.pre/hl.simple.post http://wiki.apache.org/solr/HighlightingParameters#hl.simple.pre.2BAC8-hl.simple.post ... Use hl.tag.pre and hl.tag.post for FastVectorHighlighter (see example under hl.fragmentsBuilder) And solrconfig.xml in example: !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder If you don't use multi-colored tag, you can simply set: fragmentsBuilder name=simpletag class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[b]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Order by hl.snippets count
(12/11/20 1:50), Gabriel Croitoru wrote: Hello, I'm using Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters options. The client just asked us to change the order from the default score to the number of hl.snippets per document. It's this posibble from Solr configuration? (without implementing a custom scoring algorithm)? I don't think it is possible. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Patch Needed for Issue Solr-3790
(12/11/09 19:20), mechravi25 wrote: Hi All, Im using Solr 3.6.1 version. For the issue given in the following url, there is no patch file provided https://issues.apache.org/jira/browse/SOLR-3790 Can you tell me if there is patch file for the same? Also, We noticed that the below url had the changes that had to be done to resolve this issue. In this, only one file SolrIndexSearcher,java was changed by including, synchronized(this) above the line 'if (storedHighlightFieldNames == null) {' inside the 'public CollectionString getStoredHighlightFieldNames()' method http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h Can anyone confirm me if this is the only change to resolve the same? Yes, it is the only change to resolve the problem, I think. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: SLOR And OpenNlp integration
(12/10/11 20:40), ahmed wrote: Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme I think if you attach the error you got helps us to understand your problem. Also before then what do you want to do with Solr and OpenNLP integration? koji -- http://soleami.com/blog/starting-lab-work.html
Re: Regarding delta-import and full-import
(12/09/27 22:45), darshan wrote: Hi All, Can anyone refer me few number blogs that explains both imports in little bit more detail and with examples. Thanks, Darshan Asking Google, I got: http://www.arunchinnachamy.com/apache-solr-mysql-data-import/ http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx http://pooteeweet.org/blog/1827 : koji -- http://soleami.com/blog/starting-lab-work.html
Re: solr binary protocol
(12/09/27 9:29), Radim Kolar wrote: Its possible to use SOLR binary protocol instead of xml for taking TO SOLR? I know that it can be used in Solr reply. Have you looked javabin? http://wiki.apache.org/solr/javabin koji -- http://soleami.com/blog/starting-lab-work.html
Re: Broken highlight truncation for hl.alternateField
Hi Arcadius, I think it is a feature. If no match terms found on hl.fl fields then it triggers hl.alternateField function, and if you set hl.maxAlternateFieldLength=[LENGTH], the highlighter extracts the first [LENGTH] characters of stored data of the hl.fl field. As this is the common feature of both highlighter and FVH, it doesn't take into account hl.bs.type (it is a special param for boundary scanner). For now, implement boundary scanning in your client if you want. koji -- http://soleami.com/blog/starting-lab-work.html (12/09/15 0:13), Arcadius Ahouansou wrote: Hello. I am using the fastVectorHighlighter in Solr3.5 to highight and truncate the summary of my results. The standard breakIterator is being used with hl.bs.type = WORD as per http://lucidworks.lucidimagination.com/display/solr/Highlighting Search is being performed on the document title and summary. In my edismax requesthandler, I have as default: str name=hl.useFastVectorHighlightertrue/str str name=hl.flsummary/str str name=f.summary.hl.alternateFieldsummary/str A simplified query looks like this: /solr/search?q=helphl=truef.summary.hl.fragsize=250f.summary.hl.maxAlternateFieldLength=250 So, I am truncating only the summary. 1- When a search term is found in the decription, everyting works well as expected and the summary is truncated and contains whole words only (the breakIterator is being applied properly) 2- However, when there is no match in the summary, then the f.summary.hl.alternateField quicks-in and the summary returned is often truncated in the middle of a word (i.e we may get peo instead of people). This lets me suppose that the breakIterator is not applied to f.summary.hl.alternateField. My question is: how to return full word truncation when summary is fetched from f.summary.hl.alternateField ? (i.e no match in summary) Or is there any other way I could get proper truncation when there is no match in the summary? Thank you very much. Arcadius
Re: Doubts in PathHierarchyTokenizer
Use delimiter option instead of pattern for PathHierarchyTokenizerFactory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory koji -- http://soleami.com/blog/starting-lab-work.html (12/09/12 22:22), mechravi25 wrote: Hi, Im Using Solr 3.6.1 version and I have a field which is having values like A|B|C B|C|D|EE A|C|B A|B|D ..etc.. So, When I search for A|B, I should get documents starting with A and A|B To implement this, I've used PathHierarchyTokenizer for the above field as fieldType name=filep class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.PathHierarchyTokenizerFactory pattern=|/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer /fieldType But, When I use the solr analysis page to check if its being split on the pipe symbol (|) on indexing, I see that its being taken as the entire token and its not getting split on the delimiter (i.e. the searching is done only for A|B in the above case) I also tried using \| as the delimiter but also its not working. Am I missing anything here? Or Will the Path Hierarchy not accept pipe symbol (|) as delimiter? Can anyone guide me on this? Thanks a lot -- View this message in context: http://lucene.472066.n3.nabble.com/Doubts-in-PathHierarchyTokenizer-tp4007216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PathHierarchyTokenizerFactory behavior
(12/07/09 19:41), Alok Bhandari wrote: Hello, this is how the field is declared in schema.xml fieldType name=text_path class=solr.TextField stored=true indexed=true positionIncrementGap=100 analyzer tokenizer class=solr.PathHierarchyTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType when I query for this filed with input M:/Users/User/AppData/Local/test/abc.txt . It searches for documents containing any of the token generated M,Users, User etc.but I want to search for exact file with the given input as a value. Please let me know how I can achieve that. I am using solr 3.6.thanks Can you try KeywordTokenizerFactory instead of PathHierarchyTokenizerFactory? koji -- http://soleami.com/blog/starting-lab-work.html
Re: using Carrot2 custom ITokenizerFactory
My problem was gone. Thanks Staszek and Dawid! koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/05/21 18:11), Stanislaw Osinski wrote: Hi Koji, Dawid came up with a simple fix for this, it's committed to trunk and 3.6 branch. Staszek
using Carrot2 custom ITokenizerFactory
Hello, As I'd like to use custom ITokenizerFactory, I set the following Carrot2 key in solrconfig.xml: searchComponent name=clustering enable=${solr.clustering.enabled:true} class=solr.clustering.ClusteringComponent lst name=engine str name=namedefault/str : str name=PreprocessingPipeline.tokenizerFactorymy.own.TokenizerFactory/str /lst /searchComponent But seems that CarrotClusteringEngine overwrites it with LuceneCarrot2TokenizerFactory in init() method: BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) .stemmerFactory(LuceneCarrot2StemmerFactory.class) .tokenizerFactory(LuceneCarrot2TokenizerFactory.class) .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); Am I missing something? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: using Carrot2 custom ITokenizerFactory
Hi Staszek, I'll wait your fix. Thank you! Koji Sekiguchi from iPad2 On 2012/05/20, at 18:18, Stanislaw Osinski stanis...@osinski.name wrote: Hi Koji, You're right, the current code overwrites the custom tokenizer though it shouldn't. LuceneCarrot2TokenizerFactory is there to avoid circular dependencies (Carrot2 default tokenizer depends on Lucene), but it shouldn't be an issue with custom tokenizers. I'll try to commit a fix later today. Meanwhile, if you have a chance to recompile the code, a temporary solution would be to hardcode your tokenizer class into the fragment you pasted: BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) .stemmerFactory(LuceneCarrot2StemmerFactory.class) .tokenizerFactory(YourCustomTokenizer.class) .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); Staszek On Sun, May 20, 2012 at 9:40 AM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, As I'd like to use custom ITokenizerFactory, I set the following Carrot2 key in solrconfig.xml: searchComponent name=clustering enable=${solr.clustering.enabled:true} class=solr.clustering.ClusteringComponent lst name=engine str name=namedefault/str : str name=PreprocessingPipeline.tokenizerFactorymy.own.TokenizerFactory/str /lst /searchComponent But seems that CarrotClusteringEngine overwrites it with LuceneCarrot2TokenizerFactory in init() method: BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) .stemmerFactory(LuceneCarrot2StemmerFactory.class) .tokenizerFactory(LuceneCarrot2TokenizerFactory.class) .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); Am I missing something? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Newbie with Carrot2?
(12/05/20 23:21), Xue-Feng Yang wrote: Hi Staszek, I haven't found a way for inputting data into solr in the wiki. Does that mean docs can be inputted in a normal solr way after configuration? for example, DIH or solrj. Thanks, Xue-Feng Right, because Carrot2 clustering is for search time. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: using Carrot2 custom ITokenizerFactory
.util.attribute.AttributeBinder.set(AttributeBinder.java:129) at org.carrot2.core.ControllerUtils.init(ControllerUtils.java:50) at org.carrot2.core.PoolingProcessingComponentManager$ComponentInstantiationListener.objectInstantiated(PoolingProcessingComponentManager.java:189) ... 30 more Caused by: java.lang.IllegalArgumentException: Can not set org.carrot2.text.linguistic.ITokenizerFactory field org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.tokenizerFactory to java.lang.String at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:146) at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:150) at sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:63) at java.lang.reflect.Field.set(Field.java:657) at org.carrot2.util.attribute.AttributeBinder$AttributeBinderActionBind.performAction(AttributeBinder.java:610) ... 37 more I should dig in, but if you have any clue, it would be appreciated. I'm using 3.6 branch. koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/05/20 21:11), Stanislaw Osinski wrote: Hi Koji, It's fixed in trunk and 3.6.1 branch now. If you hit any other issues with this, let me know. Staszek On Sun, May 20, 2012 at 1:02 PM, Koji Sekiguchik...@r.email.ne.jp wrote: Hi Staszek, I'll wait your fix. Thank you! Koji Sekiguchi from iPad2 On 2012/05/20, at 18:18, Stanislaw Osinskistanis...@osinski.name wrote: Hi Koji, You're right, the current code overwrites the custom tokenizer though it shouldn't. LuceneCarrot2TokenizerFactory is there to avoid circular dependencies (Carrot2 default tokenizer depends on Lucene), but it shouldn't be an issue with custom tokenizers. I'll try to commit a fix later today. Meanwhile, if you have a chance to recompile the code, a temporary solution would be to hardcode your tokenizer class into the fragment you pasted: BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) .stemmerFactory(LuceneCarrot2StemmerFactory.class) .tokenizerFactory(YourCustomTokenizer.class) .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); Staszek On Sun, May 20, 2012 at 9:40 AM, Koji Sekiguchik...@r.email.ne.jp wrote: Hello, As I'd like to use custom ITokenizerFactory, I set the following Carrot2 key in solrconfig.xml: searchComponent name=clustering enable=${solr.clustering.enabled:true} class=solr.clustering.ClusteringComponent lst name=engine str name=namedefault/str : str name=PreprocessingPipeline.tokenizerFactorymy.own.TokenizerFactory/str /lst /searchComponent But seems that CarrotClusteringEngine overwrites it with LuceneCarrot2TokenizerFactory in init() method: BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) .stemmerFactory(LuceneCarrot2StemmerFactory.class) .tokenizerFactory(LuceneCarrot2TokenizerFactory.class) .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); Am I missing something? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Is it possible to limit the bandwidth of replication
(12/05/07 15:38), James wrote: I notice the index replication utilize the full bandwidth. So the normal query stalled. Is there any method to control the bandwidth of replication? I don't know the status of Java based replication, but there is bwlimit option for your problem for script based replication. https://issues.apache.org/jira/browse/SOLR-2099 koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory
(12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from dataDir${solr.data.dir:./solr/data}/dataDir To dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir 2. I placed my elevate.xml in my solr's data directory. Based on forum answers, I thought placing elevate.xml under data directory would pick my latest change. I restarted tomcat. 3. When i placed my elevate.xml under conf directory, elevation was working with url: http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name But when i moved to data directory, I am not seeing any results. NOTE: I can see the catalina.out, printing solr reading the file from data directory. I tried to give invalid entries; I noticed solr errors parsing elevate.xml from data directory. I even tried to send some documents to index, thought commit might help to read the elevate config file. But nothing helped. I don't understand why below url does not work anymore. There are no errors in the log files. http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name Any help on this topic is appreciated. Hi Noordeen, What do you mean by I am not seeing any results.? Is it no docs in response (numFound=0) ? And have you tried the original ${solr.data.dir:./solr/data} for the dataDir? Isn't it working for you too? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: How to integrate sen and lucene-ja in SOLR 3.x
(12/05/02 1:47), Shanmugavel SRD wrote: Hi, Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4 or 3.5 or 3.6 version? I think lucene-ja.jar no longer exists in Internet and doesn't work with Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know AttributeSource). Use lucene-gosen which is the descendant project of sen/lucene-ja instead. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Solr: Highlighting word parts in excerpt does not work
(12/04/05 15:34), Thomas Werthmüller wrote: Hi I configured solr that also word parts are found. When is search Monday or Mond the right document is found. This is done with the following configuration in the schema.xml:filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30/. Now, when I add hl=true to the query sting, the excerpt for Monday looks good and the word is highlighted. When i search only with Mond, the document is found but no excerpt is returned because the query sting is not the whole word. I hope someone can give me a hint that also excerpts returned with word parts. Thanks! Thomas Hi Thomas, Highlighter doesn't support N-gram field, I think. (Or does it support N-gram field recently?) FastVectorHighlighter does support such fields but fixed-gram-size only, e.g. minGramSize=3 maxGramSize=3. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Why my highlights are wrong(one character offset)?
How does your sequence field look like in schema.xml, fieldType and field? And what version are you using? koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/27 13:06), neosky wrote: all of my highlights has one character mistake in the offset,some fragments from my response. Thanks! response lst name=responseHeader int name=status0/int int name=QTime259/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flsequence/str str name=wt/ str name=hltrue/str str name=rows10/str str name=version2.2/str str name=fl*,score/str str name=hl.useFastVectorHighlightertrue/str str name=start0/str str name=qsequence:NGNFN/str str name=qt/ str name=fq/ /lst /lst lst name=highlighting lst name= B9SUS0 arr name=sequence strTSQSELemSNGNF/emNRRPKIELSNFDGNHPKTWIRKC/str /arr /lst lst name= Q01GW2 arr name=sequence strGENTREemRNGNF/emNSLTRERSFAELENHPPKVRRNGSEG/str /arr /lst lst name= C5L0V0 arr name=sequence strEGRYPCemNNGNF/emNLTTGRCVCEKNYVHLIYEDRI/str /arr /lst lst name= C4JX93 arr name=sequence strYAEENYemINGNF/emNEEPY/str /arr /lst lst name= D7CK80 arr name=sequence strKEVADDemCNGNF/emNQPTGVRI/str /arr /lst /lst /response -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860283p3860283.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reporting tools
(12/03/09 12:35), Donald Organ wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You may be interested in: Free Query Log Visualizer for Apache Solr http://soleami.com/ koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Help with Synonyms
(12/03/06 0:11), Donald Organ wrote: Try to remove tokenizerFactory=**KeywordTokenizerFactory in your synonym filter definition because I think you would want to tokenize the synonym settings in synonyms.txt as floor / locker = storage / locker. But if you set it to KeywordTokenizer, it will be a map of floor locker = storage locker, and as you are using WhitespaceTokenizer for yourtokenizer/ inanalyzer/, then if you try to index floor locker, it will be floor/locker (not floor locker), as a result, it will not match to your synonym map. Aside, I recommend that you would setcharFilter/ -tokenizer/ - filter/ chain in the natural order inanalyzer/, though if those are wrong it won't be the cause of the problem at all. OK so I have updated my schema.xml to the following: fieldType name=text class=solr.TextField positionIncrementGap=100 omitNorms=false analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer . I am still getting results for storage locker and no results for floor locker synonyms.txt still looks like this: floor locker=storage locker Hi Donald, Do you use same SynonymFilter setting to the query analyzer part (analyzer type=query)? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Help with Synonyms
(12/03/06 11:07), Donald Organ wrote: No I do synonyms at index time. : I am still getting results for storage locker and no results for floor locker synonyms.txt still looks like this: floor locker=storage locker So that's the cause of the problem. Due to the definition floor locker=storage locker on index time analysis, you got storage / locker in your index, no floor terms in your index at all. In general, if you use = method in your synonyms.txt, you should apply same rule to both index and query time. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Help with Synonyms
(12/03/06 11:23), Donald Organ wrote: Ok so do I need to use a different format in my synonyms.txt file in order to do this at index time? Right, if you want to apply synonym rules to only index time. Use , like this: floor locker, storage locker And don't forget to set expand=true in your index time synonym definition. This makes if you have floor locker in your document, it will be expanded not only floor locker but also storage locker in index, then you can search the document by any of q=floor locker or storage locker. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: nutch log
(12/03/03 20:32), alessio crisantemi wrote: this is my nutch log after configured it for solr index: : org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) : suggestions? thanks alessio Hi alessio, I have no ideas for nutch, but I think you can look for the cause of the internal server error in Solr log, not in nutch log. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: nutch log
(12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave: org.apache.solr.common.SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check the values of bool.../bool of your solr plugins in solrconfig.xml. Those should be one of true/false/on/off/... koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: nutch log
It is not solr error. Consult nutch/hadoop mailing list. koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/04 2:38), alessio crisantemi wrote: now, I solve the boolean problem. but my indexing don't works now also.. But this time, I don't have error in tomcat log and not error in nutch log. I see only this code on cygwin window: Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl/segments/20120303171628/parse_data at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) why, in your opinion? thanks again alessio Il giorno 03 marzo 2012 16:43, Koji Sekiguchik...@r.email.ne.jp ha scritto: (12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.**SolrException log Grave: org.apache.solr.common.**SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check the values ofbool.../bool of your solr plugins in solrconfig.xml. Those should be one of true/false/on/off/... koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Help with Synonyms
(12/03/03 1:39), Donald Organ wrote: I am trying to get synonyms working correctly, I want to map floor locker tostorage locker currently searching for storage locker produces results were as searching for floor locker does not produce any results. I have the following setup for index time synonyms: fieldType name=text class=solr.TextField positionIncrementGap=100 omitNorms=false analyzer type=index filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=KeywordTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer And my synonyms.txt looks like this: floor locker=storage locker What am I doing wrong? Hi Donald, Try to remove tokenizerFactory=KeywordTokenizerFactory in your synonym filter definition because I think you would want to tokenize the synonym settings in synonyms.txt as floor / locker = storage / locker. But if you set it to KeywordTokenizer, it will be a map of floor locker = storage locker, and as you are using WhitespaceTokenizer for your tokenizer/ in analyzer/, then if you try to index floor locker, it will be floor/locker (not floor locker), as a result, it will not match to your synonym map. Aside, I recommend that you would set charFilter/ - tokenizer/ - filter/ chain in the natural order in analyzer/, though if those are wrong it won't be the cause of the problem at all. koji -- Query Log Visualizer for Apache Solr http://soleami.com/