Re: Tokenizing managed synonyms

2020-07-06 Thread Koji Sekiguchi
I think the question makes sense as SynonymGraphFilterFactory accepts tokenizerFactory, he asked the managed version of SynonymGraphFilter could accept it as well. https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter The answer seems to be NO. Koji On

per field mm

2018-12-14 Thread Koji Sekiguchi
Hi, I have a use case that one of our customers wants to set different mm parameter per field, as in some fields of qf, unexpectedly many terms are produced because they are N-gram fields while in other fields, few terms are produced because they are normal text fields. If it is reasonable,

Re: Implementing NeuralNetworkModel RankNet in Solr LTR

2018-09-19 Thread Koji Sekiguchi
eck, is this supported in Solr 7.4.0? Regards, Edwin On Wed, 19 Sep 2018 at 11:02, Koji Sekiguchi wrote: Hi, > https://github.com/airalcorn2/Solr-LTR#RankNet > > Has anyone tried on this before? And what is the format of the training > data that this model requires? I haven't tried i

Re: Implementing NeuralNetworkModel RankNet in Solr LTR

2018-09-18 Thread Koji Sekiguchi
Hi, > https://github.com/airalcorn2/Solr-LTR#RankNet > > Has anyone tried on this before? And what is the format of the training > data that this model requires? I haven't tried it, but I'd like to inform you that there is another project of LTR we've been developed:

Re: Return only matched multi-valued field

2017-08-21 Thread Koji Sekiguchi
Hi, I don't think Lucene/Solr can know which field matches the query you posted. You should usually use Highlighter to know it. Koji On 2017/08/22 2:46, ruby wrote: Is there a way to return only the matched field from a multivalued field using filtering? -- View this message in context:

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi
Hi Shamik, I'm sorry but I don't understand why you use KeywordRepeatFilter. I think it's normal to create separate fields to solve this kind of problems. Why don't you have another separate field which has ShingleFilter as I mentioned in the previous reply? Koji On 2017/07/20 12:13, shamik

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi
Hi Shamik, How about using ShingleFilter which constructs token n-grams from a token stream? http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html As for "about dynamic block", ShingleFilter produces "about dynamic" and "dynamic block".

Re: Is there any particular reason why ExternalFileField is read from data directory

2017-06-29 Thread Koji Sekiguchi
Hi, ExternalFileField was introduced via SOLR-351. https://issues.apache.org/jira/browse/SOLR-351 The author thought values could optionally be updated often... I think it describes why it is read from not config, but datadir. Koji On 2017/06/29 17:17, apoorvqwerty wrote: Hi, As per the

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Koji Sekiguchi
Hi Walter, May I ask a tangential question? I'm curious the following line you wrote: > Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just

Re: Classify document using bag of words

2017-03-26 Thread Koji Sekiguchi
Hi, I'm not sure that it can help you but I'd like to show you the link of an article which I wrote about document classification years ago: Comparing Document Classification Functions of Lucene and Mahout

Re: Query/Field Index Analysis corrected but return no docs in search

2017-02-05 Thread Koji Sekiguchi
Hi Peter, I'm not sure if I can correctly see the result you attached, I think it sounds reasonable to me that you couldn't get search result, because your query 均匀肤色 is used as it is without being analyzed whereas the same string 均匀肤色 is tokenized as 均匀 匀肤 肤色 in the index. So it is obvious

Re: How to train the model using user clicks when use ltr(learning to rank) module?

2017-02-02 Thread Koji Sekiguchi
Hi, NLP4L[1] has not only Learning-to-Rank module but also a module which calculates click model and converts it into pointwise annotation data. NLP4L has a comprehensive manual[2], but you may want to read "Click Log Analysis" section[3] first to see if it suits your requirements. Hope this

Re: I cannot get phrases highlighted correctly without using the Fast Vector highlighter

2016-09-20 Thread Koji Sekiguchi
Hello Panagiotis, I'm sorry but it's a feature. As for hl.usePhraseHighlighter parameter, when you turn off it, you may get only foo or bar highlighted in your snippets. Koji On 2016/09/18 15:55, Panagiotis T wrote: I'm using Solr 6.2 (tried with 6.1 also) I created a new core and the only

Re: Query Elevation

2016-07-11 Thread Koji Sekiguchi
Hello, I'm curious, why do you want the particular document to place second, not top, of the result for a particular query? Sorry this isn't the answer for your question, but I think you can implement it rather easy if you study the existing query elevation. Koji On 2016/07/08 19:59,

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Koji Sekiguchi
Hi, ... must have one and only one and it can have zero or more s. From the point of view of the rules, your ... is not correct because it has more than one and ... is not correct as well because it has no . Koji On 2016/03/02 20:25, G, Rajesh wrote: Hi Team, Can you please clarify the

Re: Help With Phrase Highlighting

2015-12-01 Thread Koji Sekiguchi
Hi Teague, I couldn't understand the part of "document size" in your question, but if you'd like Solr to return snippet My search phrase instead of My search phrase you should use FastVectorHighlighter. In case use of FVH, your highlight field (hl.fl=text) need to be indexed with options

Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-15 Thread Koji Sekiguchi
Hi Vitaly, I'm not sure I understand you correctly, why don't you put EdgeNGramFilter just after ShingleFilter? That is: Koji On 2015/10/15 22:47, vitaly bulgakov wrote: I want to rephrase my question I asked in another post. As far as I understand filter ShingleFilterFactory creates

Re: highlighting

2015-10-01 Thread Koji Sekiguchi
Hi Mark, I think I saw similar requirement recently in mailing list. The feature sounds reasonable to me. > If not, how do I go about posting this as a feature request? JIRA can be used for the purpose, but there is no guarantee that the feature is implemented. :( Koji On 2015/10/01 20:07,

Re: solr.SynonymFilterFactory

2015-09-17 Thread Koji Sekiguchi
Hi Vincenzo, By intuition, regardless of what value you set for attributes such as expand or ignoreCase, I think synonym records that LHS==RHS are meaningless. That is, you can remove these lines. Koji On 2015/09/17 16:51, Vincenzo D'Amore wrote: Hello, this may be a silly question. I

Re: How to export the list of terms indexed in Solr?

2015-04-29 Thread Koji Sekiguchi
Hi brent3600, You can use NLP4L for this purpose. NLP4L is good at counting the number of words not only in whole index but also in a set of documents. There is a tutorial for this function. Count the number of words http://nlp4l.github.io/tutorial_ja.html#useNLP Sorry but the tutorial is

Re: Sorting and Rerank

2015-03-25 Thread Koji Sekiguchi
Hi, You're right. Those sets are same each other, only documents order is different. Koji On 2015/03/26 0:53, innoculou wrote: If I do an initial search without any field sorting; and then do the exact same query but also sort one field will I get the same result set in the subsequent query

Re: Lucene cosine similarity score for more like this query

2015-02-03 Thread Koji Sekiguchi
Lucene uses TFIDFSimilarity class to calculate the similarity. It is implemented on the idea of cosine measurement but it modifies the cosine formula. Please take a look at Lucene Practical Scoring Function in the following Javadoc:

[ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
Hello, It's my pleasure to share that I have an interesting tool word2vec for Lucene available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use word2vec for Lucene to extract word vectors from Lucene index. Thank you, Koji --

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
') + vector('woman') is close to vector('queen') Thanks, Koji (2014/11/20 20:01), Paul Libbrecht wrote: Hello Koji, how would you compare that to SemanticVectors? paul On 20 nov. 2014, at 10:10, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, It's my pleasure to share that I have

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
more transparent math in the web-page. Maybe this helps a bit? SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. Maybe it's more a question of presentation. Paul On 20 nov. 2014, at 16:24, Koji Sekiguchi k

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi
has always rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. Maybe it's more a question of presentation. Paul On 20 nov. 2014, at 16:24, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi Paul, I cannot compare it to SemanticVectors as I don't know

Re: boosting words from specific list

2014-09-29 Thread Koji Sekiguchi
Hi Ali, I don't think Solr has such function OOTB. One way I can think of is that you can implement UpdateRequestProcessor. In processAdd() method of the UpdateRequestProcessor, as you can read field values, you can calculate the total score and copy the total score to a field e.g. total_score.

Re: statuscode list

2014-09-07 Thread Koji Sekiguchi
Hi Jan, (2014/09/05 21:01), Jan Verweij - Reeleez wrote: Hi, If I'm correct you will get a statuscode=0 in the response if you use XML messages for updating the solr index. I think you mean by statuscode=0 is status=0 here. ?xml version=1.0 encoding=UTF-8? response lst

Re: ExternalFileFieldReloader and commit

2014-08-05 Thread Koji Sekiguchi
Hi Peter, It seems like a bug to me, too. Please file a JIRA ticket if you can so that someone can take it. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/08/05 22:34), Peter Keegan wrote: When there are multiple 'external file

Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-24 Thread Koji Sekiguchi
Hi, In addition, this might be useful: Fundamentals of Information Retrieval, Illustration with Apache Lucene https://www.youtube.com/watch?v=SCsS5ePGmCs This video is about 40 minutes long, but you can fast forward to 24:00 to learn scoring based on vector space model and how Lucene customize

Re: Contiguous Phrase Highlighting Example

2014-07-17 Thread Koji Sekiguchi
Hi Teague, If you want phrase-unit tagging for highlighter, you need to use FastVectorHighlighter instead of the ordinary Highlighter. To turn on FVH, set hl.useFastVectorHighlighter=on when querying. In addition, when indexing, you need to set termVectors=on, termPositions=on and

Re: OCR - Saving multi-term position

2014-07-02 Thread Koji Sekiguchi
Hi Manuel, I think OCR error correction is one of well-known NLP tasks. I'd thought it could be implemented in the past by using Lucene. This is a brief idea: 1. You have got a Lucene index. This existing index is made from correct (i.e. error free) documents that are same domain of OCR

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Koji Sekiguchi
In addition, KeywordTokenizer can be seemingly used but it should be avoided for unique key field. One of my customers that used it and they had got OOM during a long term indexing. As it was difficult to find the problem, I'd like to share my experience. Koji --

Re: Multiple highlight snippet for single field

2014-05-16 Thread Koji Sekiguchi
Hi Bijan, Have you tried to set hl.maxAnalyzedChars parameter to larger number? hl.maxAnalyzedChars http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars As the default value of the parameter is 51200, if the second Andy is at the end paragraph of your large stored field, the

Re: AND not as a boolean operator in Phrase

2014-03-25 Thread Koji Sekiguchi
(2014/03/26 2:29), abhishek jain wrote: hi friends, when i search for A and B it gives me result for A , B , i am not sure why? Please guide how can i exact match when it is within phrase/quotes. Generally speaking (w/ LuceneQParser), if you want phrase match results, use quotes, i.e. q=A B.

Re: Solr Nutch

2014-01-28 Thread Koji Sekiguchi
1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. In addition, I think Nutch has PageRank-like scoring function as opposed to Lucene/Solr, those are based on vector space model scoring. koji --

Re: document contained more than 100000 characters

2013-12-25 Thread Koji Sekiguchi
Hi, I'm not sure but you probably met Tika exception. Have you checked Apache Tika mailing list? Hmm, just now I googled Your document contained more than 10 characters, I found a page in StackOverFlow. According to it, there is API to change the limit. But I don't know whether Solr can

Re: indexing from bowser

2013-12-16 Thread Koji Sekiguchi
Hi, (13/12/16 19:46), Nutan wrote: how to index pdf,doc files from browser? I think you can index from browser. If you said that this query is used for indexing : curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;

Re: Passing a Parameter to a Custom Processor

2013-12-13 Thread Koji Sekiguchi
Hi Dileepa, The stanbolInterceptor processor chain will be used in multiple request handlers. Then I will have to pass the stanbol.enhancer.url param in each of those request handler which will cause redundant configurations. Therefore I need to pass the param to the processor directly. But

Re: SOLRJ API to do similar CURL command execution

2013-11-13 Thread Koji Sekiguchi
(13/11/13 22:25), Anupam Bhattacharya wrote: How can I post the whole XML string to SOLR using its SOLRJ API ? The source code of SimplePostTool would be of some help: http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/SimplePostTool.html koji --

Re: count links pointing to id

2013-11-10 Thread Koji Sekiguchi
(13/11/10 3:43), Andreas Owen wrote: I have a multivalue field with links pointing to ids of solrdocuments. I would like calculate how many links are pointing to each document und put that number into the field links2me. How can I do this, I would prefer to do it with a query and the updater so

Re: solr sort facets by name

2013-11-05 Thread Koji Sekiguchi
(13/11/06 9:00), PeterKerk wrote: By default solr sorts facets by the amount of hits for each result. However, I want to sort by facetnames alphabetically. Earlier I sorted the facets on the client or via my .NET code, however, this time I need solr to return the results with alphabetically

Re: Unable to add mahout classifier

2013-10-31 Thread Koji Sekiguchi
Caused by: java.lang.ClassCastException: class com.mahout.solr.classifier.CategorizeDocumentFactory at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433) at

Re: Unable to add mahout classifier

2013-10-30 Thread Koji Sekiguchi
(13/10/30 22:09), lovely kasi wrote: Hi, I made few changes to the solrconfig.xml, created a jar file,added it to the lib folder of the solr and tried to start it. THe changes in the solrconfig.xml are updateRequestProcessorChain name=mahoutclassifier default=true processor

Re: Return the synonyms as part of Solr response

2013-10-30 Thread Koji Sekiguchi
Hi Siva, (13/10/30 18:12), sivaprasad wrote: Hi, We have a requirement where we need to send the matched synonyms as part of Solr response. I don't think that Solr has such function. Do we need to customize the Solr response handler to do this? So the answer is yes. koji --

Re: Help on solr more like this functionality

2013-10-26 Thread Koji Sekiguchi
Hi Suren, (13/10/25 23:36), Suren Raju wrote: Hi, We are trying to solve a business problem by performing solr more like this query. We are able to perform the more like this search. We have a specific use case that requires different boost on different match fields. Say i do more like this

Re: how to debug my own analyzer in solr

2013-10-21 Thread Koji Sekiguchi
Hi Mingz, If you use Eclipse, you can debug Solr with your plugin like this: # go to Solr install directory $ cd $SOLR $ ant run-example -Dexample.debug=true Then connect the JVM from Eclipse via remote debug port 5005. Good luck! koji (13/10/21 18:58), Mingzhu Gao wrote: More information

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Koji Sekiguchi
) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) ... 16 more On Thu, Oct 17, 2013 at 5:19 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hi

Re: ExtractRequestHandler, skipping errors

2013-10-17 Thread Koji Sekiguchi
Hi Roland, (13/10/17 20:44), Roland Everaert wrote: Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the

Re: req info : SOLRJ and TermVector

2013-10-16 Thread Koji Sekiguchi
(13/10/16 17:47), elfu wrote: hi, can i access TermVector information using solrj ? There is TermVectorComponent to get termVector info: http://wiki.apache.org/solr/TermVectorComponent So yes, you can access it using solrj. koji --

Re: fq caching question

2013-10-14 Thread Koji Sekiguchi
Hi Tim, (13/10/15 5:22), Tim Vaillancourt wrote: Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a combined filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1)

Re: Please help!, Highlighting exact phrases with solr

2013-10-10 Thread Koji Sekiguchi
(13/10/10 18:17), Silvia Suárez wrote: I am using solrj as client for indexing documents on the solr server I am new to solr, And I am having problem with the highlighting in solr. Highlighting exact phrases with solr does not work. For example if the search keyword is: dulce hogar it returns:

Re: defType

2013-08-10 Thread Koji Sekiguchi
See line 33 to 50 at http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/QParserPlugin.java?view=markup koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/08/11 8:05), William Bell wrote: Can you list them out?

Re: Proximity and highliting

2013-08-03 Thread Koji Sekiguchi
(13/08/04 14:36), Alex Cougarman wrote: Hi all. I'm having some issues with highlighting and proximity searching in Solr 4.x. Matching words in the query are sometimes highlighted even if they are not within proximity and in some cases, matching words in the query are not highlighted at all.

Re: ICUTransformFilterFactory

2013-08-02 Thread Koji Sekiguchi
(13/08/02 17:53), Jochen Lienhard wrote: Hello, we have a problem with some special characters: for example æ We are using the ICUTranformFilterFactory for indexing and searching. We have some documents with urianae and with urianæ If I search urainae so I find only the versions with

Re: Sort by document similarity counts

2013-07-18 Thread Koji Sekiguchi
I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*sort=similarityCount). I don't understand this part very well, but: But this will not

Re: Find related words

2013-07-04 Thread Koji Sekiguchi
You may want collocations a given word? I've implemented LUCENE-474 for Solr a while ago and I found it worked pretty well. https://issues.apache.org/jira/browse/LUCENE-474 Hope this helps. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/07/04

Re: Find related words

2013-07-04 Thread Koji Sekiguchi
Hi Dotan, (13/07/04 23:51), Dotan Cohen wrote: Thank you Jack and Koji. I will take a look at MLT and also at the .zip files from LUCENE-474. Koji, did you have to modify the code for the latest Solr? Yes. As the Lucene APIs for accessing index have been changed, I had to modify the code.

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-28 Thread Koji Sekiguchi
shared source code / jar for the same so at it could be used ? Thanks, Rajesh On Mon, May 27, 2013 at 8:44 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from

Re: Note on The Book

2013-05-27 Thread Koji Sekiguchi
Hi Jack, I'd like to ask as a person who contributed a case study article about Automatically acquiring synonym knowledge from Wikipedia to the book. (13/05/24 8:14), Jack Krupansky wrote: To those of you who may have heard about the Lucene/Solr book that I and two others are writing on

[blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-27 Thread Koji Sekiguchi
Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from Wikipedia: http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html Hope that the article gives someone a good experience!

Re: Note on The Book

2013-05-27 Thread Koji Sekiguchi
contribution, that would be great. The focus of the book will be hard-core Solr. -- Jack Krupansky -Original Message- From: Koji Sekiguchi Sent: Monday, May 27, 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Re: Note on The Book Hi Jack, I'd like to ask as a person who contributed a case

Re: cache disable through solrJ

2013-05-20 Thread Koji Sekiguchi
(13/05/20 20:53), J Mohamed Zahoor wrote: Hi How do i disable cache (Solr FieldValueCache) for certain queries... using HTTP it can be done using {!cache=false}... how can i do it from solrj? ./zahoor How about using facet.method=enum? koji --

Re: Solr 3.6.1: changing a field from stored to not stored

2013-04-23 Thread Koji Sekiguchi
(13/04/24 7:09), Petersen, Robert wrote: Hi guys, What would happen if I changed a field definition on an existing field in an existing index from stored to not stored? Would solr just party on ignoring the fact that this field's data is stored in the current index? I noticed I am

Re: Returning similarity values for more like this search

2013-04-19 Thread Koji Sekiguchi
(13/04/19 23:24), Achim Domma wrote: Hi, I'm executing a search including a search for similar documents (mlt=truemlt.fl=) which works fine so far. I would like to get the similarity value for each document. I expected this to be quite common and simple, but I could not find a hint how

Re: conditional queries?

2013-04-09 Thread Koji Sekiguchi
Hi Mark, Is it possible to do a conditional query if another query has no results? For example, say I want to search against a given field for: - Search for car. If there are results, return them. - Else, search for car* . If there are results, return them. - Else, search for car~ . If

Re: Flow Chart of Solr

2013-04-02 Thread Koji Sekiguchi
(13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There

Re: Confusion over Solr highlight hl.q parameter

2013-04-02 Thread Koji Sekiguchi
(13/04/03 5:27), Van Tassell, Kristian wrote: Thanks Koji, this helped with some of our problems, but it is still not perfect. This query, for example, returns no highlighting: ?q=id:abc123hl.q=text_it_IT:l'assiemehl.fl=text_it_IThl=truedefType=edismax But this one does (when it is, in

Re: Getting back highlights almost always works...

2013-03-19 Thread Koji Sekiguchi
(13/03/20 6:14), Van Tassell, Kristian wrote: ...but I'm finding some examples where the stored text is so big (14,000 words) that Solr fails to highlight anything. But the data is definitely in the text field and is returning due to that hit. Does anyone have any ideas why this happens?

Re: Retrieving Term vectors

2013-03-19 Thread Koji Sekiguchi
Hi Sarita, I've not dug into your code detail but my first impression is that you are missing store term positions? FieldType fieldType = new FieldType(); IndexOptions indexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS; fieldType.setIndexOptions(indexOptions);

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Koji Sekiguchi
Hi Jochen, There is a restriction in FVH. FVH cannot deal with variable gram size. That is, minGramSize == maxGramSize in your NGramFilterFactory setting. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html (13/03/18 22:17), Jochen Just wrote:

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Koji Sekiguchi
So just to be clear: There is no possibility to highlight results, if I use variable gram size. Neither the original highlighter nor FVH do the job. Or am I missing something? I don't know the latest original highlighter has such restriction or not today, but when FVH came in 2.9, at that time,

Re: Confusion over Solr highlight hl.q parameter

2013-03-16 Thread Koji Sekiguchi
(13/03/16 4:08), Van Tassell, Kristian wrote: Hello everyone, If I search for a term “baz” and tell it to highlight it, it highlights just fine. If, however, I search for “foo bar” using the q parameter, which appears in that same document/same field, and use the hl.q parameter to

Re: how to overrride pre and post tags when usefastVectorHighlighter is set to true

2013-02-23 Thread Koji Sekiguchi
Hi Alex, (13/02/23 10:53), alx...@aim.com wrote: Hello, I was unable to change pre and post tags for highlighting when usefastVectorHighlighter is set to true. Changing default tags in solrconfig.xml works for standard highlighter though. I searched mailing list and the net with no success.

Re: Order by hl.snippets count

2012-11-19 Thread Koji Sekiguchi
(12/11/20 1:50), Gabriel Croitoru wrote: Hello, I'm using Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters options. The client just asked us to change the order from the default score to the number of hl.snippets per document. It's this posibble from Solr configuration?

Re: Patch Needed for Issue Solr-3790

2012-11-09 Thread Koji Sekiguchi
(12/11/09 19:20), mechravi25 wrote: Hi All, Im using Solr 3.6.1 version. For the issue given in the following url, there is no patch file provided https://issues.apache.org/jira/browse/SOLR-3790 Can you tell me if there is patch file for the same? Also, We noticed that the below url had the

Re: SLOR And OpenNlp integration

2012-10-11 Thread Koji Sekiguchi
(12/10/11 20:40), ahmed wrote: Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme I think if you attach the error you got helps us to understand your problem. Also

Re: Regarding delta-import and full-import

2012-09-27 Thread Koji Sekiguchi
(12/09/27 22:45), darshan wrote: Hi All, Can anyone refer me few number blogs that explains both imports in little bit more detail and with examples. Thanks, Darshan Asking Google, I got: http://www.arunchinnachamy.com/apache-solr-mysql-data-import/

Re: solr binary protocol

2012-09-26 Thread Koji Sekiguchi
(12/09/27 9:29), Radim Kolar wrote: Its possible to use SOLR binary protocol instead of xml for taking TO SOLR? I know that it can be used in Solr reply. Have you looked javabin? http://wiki.apache.org/solr/javabin koji -- http://soleami.com/blog/starting-lab-work.html

Re: Broken highlight truncation for hl.alternateField

2012-09-14 Thread Koji Sekiguchi
Hi Arcadius, I think it is a feature. If no match terms found on hl.fl fields then it triggers hl.alternateField function, and if you set hl.maxAlternateFieldLength=[LENGTH], the highlighter extracts the first [LENGTH] characters of stored data of the hl.fl field. As this is the common feature

Re: Doubts in PathHierarchyTokenizer

2012-09-12 Thread Koji Sekiguchi
Use delimiter option instead of pattern for PathHierarchyTokenizerFactory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory koji -- http://soleami.com/blog/starting-lab-work.html (12/09/12 22:22), mechravi25 wrote: Hi, Im Using Solr 3.6.1 version

Re: PathHierarchyTokenizerFactory behavior

2012-07-09 Thread Koji Sekiguchi
(12/07/09 19:41), Alok Bhandari wrote: Hello, this is how the field is declared in schema.xml fieldType name=text_path class=solr.TextField stored=true indexed=true positionIncrementGap=100 analyzer tokenizer class=solr.PathHierarchyTokenizerFactory/ filter

Re: using Carrot2 custom ITokenizerFactory

2012-05-21 Thread Koji Sekiguchi
My problem was gone. Thanks Staszek and Dawid! koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/05/21 18:11), Stanislaw Osinski wrote: Hi Koji, Dawid came up with a simple fix for this, it's committed to trunk and 3.6 branch. Staszek

using Carrot2 custom ITokenizerFactory

2012-05-20 Thread Koji Sekiguchi
Hello, As I'd like to use custom ITokenizerFactory, I set the following Carrot2 key in solrconfig.xml: searchComponent name=clustering enable=${solr.clustering.enabled:true} class=solr.clustering.ClusteringComponent lst name=engine str

Re: using Carrot2 custom ITokenizerFactory

2012-05-20 Thread Koji Sekiguchi
Hi Staszek, I'll wait your fix. Thank you! Koji Sekiguchi from iPad2 On 2012/05/20, at 18:18, Stanislaw Osinski stanis...@osinski.name wrote: Hi Koji, You're right, the current code overwrites the custom tokenizer though it shouldn't. LuceneCarrot2TokenizerFactory is there to avoid

Re: Newbie with Carrot2?

2012-05-20 Thread Koji Sekiguchi
(12/05/20 23:21), Xue-Feng Yang wrote: Hi Staszek, I haven't found a way for inputting data into solr in the wiki. Does that mean docs can be inputted in a normal solr way after configuration? for example, DIH or solrj. Thanks, Xue-Feng Right, because Carrot2 clustering is for search

Re: using Carrot2 custom ITokenizerFactory

2012-05-20 Thread Koji Sekiguchi
with this, let me know. Staszek On Sun, May 20, 2012 at 1:02 PM, Koji Sekiguchik...@r.email.ne.jp wrote: Hi Staszek, I'll wait your fix. Thank you! Koji Sekiguchi from iPad2 On 2012/05/20, at 18:18, Stanislaw Osinskistanis...@osinski.name wrote: Hi Koji, You're right, the current code overwrites

Re: Is it possible to limit the bandwidth of replication

2012-05-07 Thread Koji Sekiguchi
(12/05/07 15:38), James wrote: I notice the index replication utilize the full bandwidth. So the normal query stalled. Is there any method to control the bandwidth of replication? I don't know the status of Java based replication, but there is bwlimit option for your problem for script based

Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Koji Sekiguchi
(12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from dataDir${solr.data.dir:./solr/data}/dataDir To dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir 2. I placed my

Re: How to integrate sen and lucene-ja in SOLR 3.x

2012-05-01 Thread Koji Sekiguchi
(12/05/02 1:47), Shanmugavel SRD wrote: Hi, Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4 or 3.5 or 3.6 version? I think lucene-ja.jar no longer exists in Internet and doesn't work with Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know

Re: Solr: Highlighting word parts in excerpt does not work

2012-04-05 Thread Koji Sekiguchi
(12/04/05 15:34), Thomas Werthmüller wrote: Hi I configured solr that also word parts are found. When is search Monday or Mond the right document is found. This is done with the following configuration in the schema.xml:filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30/.

Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread Koji Sekiguchi
How does your sequence field look like in schema.xml, fieldType and field? And what version are you using? koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/27 13:06), neosky wrote: all of my highlights has one character mistake in the offset,some fragments from my

Re: Reporting tools

2012-03-09 Thread Koji Sekiguchi
(12/03/09 12:35), Donald Organ wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You may be interested in: Free Query Log Visualizer for Apache Solr http://soleami.com/ koji -- Query Log Visualizer for Apache Solr

Re: Help with Synonyms

2012-03-05 Thread Koji Sekiguchi
(12/03/06 0:11), Donald Organ wrote: Try to remove tokenizerFactory=**KeywordTokenizerFactory in your synonym filter definition because I think you would want to tokenize the synonym settings in synonyms.txt as floor / locker = storage / locker. But if you set it to KeywordTokenizer, it will be

Re: Help with Synonyms

2012-03-05 Thread Koji Sekiguchi
(12/03/06 11:07), Donald Organ wrote: No I do synonyms at index time. : I am still getting results for storage locker and no results for floor locker synonyms.txt still looks like this: floor locker=storage locker So that's the cause of the problem. Due to the definition floor

Re: Help with Synonyms

2012-03-05 Thread Koji Sekiguchi
(12/03/06 11:23), Donald Organ wrote: Ok so do I need to use a different format in my synonyms.txt file in order to do this at index time? Right, if you want to apply synonym rules to only index time. Use , like this: floor locker, storage locker And don't forget to set expand=true in your

Re: nutch log

2012-03-03 Thread Koji Sekiguchi
(12/03/03 20:32), alessio crisantemi wrote: this is my nutch log after configured it for solr index: : org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/update?wt=javabinversion=2 at

Re: nutch log

2012-03-03 Thread Koji Sekiguchi
(12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave: org.apache.solr.common.SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check

Re: nutch log

2012-03-03 Thread Koji Sekiguchi
It is not solr error. Consult nutch/hadoop mailing list. koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/04 2:38), alessio crisantemi wrote: now, I solve the boolean problem. but my indexing don't works now also.. But this time, I don't have error in tomcat log and

Re: Help with Synonyms

2012-03-02 Thread Koji Sekiguchi
(12/03/03 1:39), Donald Organ wrote: I am trying to get synonyms working correctly, I want to map floor locker tostorage locker currently searching for storage locker produces results were as searching for floor locker does not produce any results. I have the following setup for index

  1   2   3   4   5   6   >