Re: PDF extraction leads to reversed words

2010-03-09 Thread Dominique Bejean
Hi, The problem comes form PDFBox (http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. However Tika doesn't yet use this version of PDFBox. So for PDF text extraction, I doesn't use Tika but pdftotext. Dominique Le 09/03/10 06:00, Robert Muir a écrit : it is an optional

Re: Choosing tokenizer based on language of document

2012-04-06 Thread Dominique Bejean
Hi, Yes, I agree it is not an easy issue. Index all languages with the appropriate char filter, tokenizer and filters for each language is not possible without new text type and new analyzer development. If you plan to index up to 10 different languages, I suggest one text field per

Analyzers and ReuseStrategy in Solr 4

2012-04-21 Thread Dominique Bejean
Hi, I developed a custom analyzer. This analyzer needs to be polymorphous according to the first 4 characters of the text to be analyzed. In order to do this I implement my own ReuseStratgy class (NoReuseStrategy) and in the constructor, I do this super(new NoReuseStrategy()); At Lucene

Re: Using remote Nutch Server to crawl, then merging results into local index

2010-12-23 Thread Dominique Bejean
Hi, In order to crawl and index your web sites, may you can have a look at www.crawl-anywhere.com. It includes a web crawler, a document processing pipeline and a solr indexer. Dominique Le 23/12/10 16:27, Dietrich a écrit : I want to use Solr to index two types of documents: - local

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-03 Thread Dominique Bejean
Hi, I would not try to change the lucene version in Solr 1.4.1 from 2.9.x to 3.0.x. As said Koji, the best solution is to get the branch 3.x or the trunk and build it. You need svn and ant. 1. Create a working directory $ mkdir ~/solr 2. Get the source $ cd ~/solr $ svn co

[ANNOUNCE] Web Crawler

2011-03-01 Thread Dominique Bejean
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Dominique Le 02/03/11 09:36, Rosa (Anuncios) a écrit : Nice job! It would be good to be able to extract specific data from a given page via XPATH though. Regards, Le 02/03/2011 01:25, Dominique Bejean a écrit : Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Java into search field, it returned a lot of references to coldfusion error pages. May be a recrawl would help? On Wed, Mar 2, 2011 at 1:25 AM, Dominique Bejean dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr wrote: Hi, I would like to announce Crawl Anywhere. Crawl

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Aditya, The crawler is not open source and won't be in the next future. Anyway, I have to change the license because it can be use for any personal or commercial projects. Sincerely, Dominique Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
looking for crawlers that incoporate this technology but without success. Any plans on incorporating this? Cheers, Geert-Jan 2011/3/2 Dominique Bejean dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr Rosa, In the pipeline, there is a stage that extract the text from

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
and that is posing challenges with Nutch? -Original Message- From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] Sent: Wednesday, March 02, 2011 6:22 AM To: solr-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Aditya, The crawler is not open source and won't

core isolation

2012-06-28 Thread Dominique Bejean
Hi, In Solr 3.x the parameter abortOnConfigurationError=false allows cores continue to work even if an other core fails due to a configuration error. This parameter doesn't exist anymore in Solr 4.0 but afetr some tests, it looks like cores are isolated from each other. By isolated, I mean

Upgrade solr 3.4 to solr 3.6.1 without rebuilding the existing index ?

2012-08-20 Thread Dominique Bejean
Hi, I think the response is yes, but I need to check. Is it possible to upgrade from solr 3.4 to solr 3.6.1 without rebuilding the existing index ? Thank you. Dominique

Re: Upgrade solr 3.4 to solr 3.6.1 without rebuilding the existing index ?

2012-08-20 Thread Dominique Bejean
, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, I think the response is yes, but I need to check. Is it possible to upgrade from solr 3.4 to solr 3.6.1 without rebuilding the existing index ? Thank you. Dominique

RessourceLoader in Solr 4 beta

2012-08-27 Thread Dominique Bejean
Hi, I wrote a custom fieldtype that need to read a configuration file in the conf directory of the core and also get the absolute path of the conf directory In solr 4 alpha, my code was something like : import org.apache.solr.core.SolrResourceLoader; ... public class MultilingualField

Re: RessourceLoader in Solr 4 beta

2012-08-30 Thread Dominique Bejean
(); if (path.lastIndexOf(f.separatorChar)!=-1) { return path.substring(0, path.lastIndexOf(f.separatorChar)); } return null; } return null; } Not sure it is the best way, but it works :) Dominique Le 27/08/12 23:40, Dominique Bejean a écrit

Re: Website (crawler for) indexing

2012-09-07 Thread Dominique Bejean
May be you can take a look at Crawl-Anywhere which have administration web interface, solr indexer and search web application. www.crawl-anywhere.com Regards. Dominique Le 05/09/12 17:05, Lochschmied, Alexander a écrit : This may be a bit off topic: How do you index an existing website and

Re: Question about MoreLikeThis query with solrj

2012-10-11 Thread Dominique Bejean
Hi, Are you using a correct stopword file for the French language ? It is very importante in order the the MLT component works fine. You should also take a look at this document. http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ MLT support in SolrJ is a an old story. May be

Re: Crawl Anywhere -

2013-05-22 Thread Dominique Bejean
Hi, I didn't see this question. Yes, I confirm Crawl-Anywhere can crawl in distributed environment. If you have several huge web sites to crawl, you can dispatch crawling across several crawler engines. However, one single web site can only be crawled by one crawler engine at a time. This

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean
Hi, Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere Best regards. Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean
Hi, I did see this message (again). Please, use the new dedicated Crawl-Anywhere forum for your next questions. https://groups.google.com/forum/#!forum/crawl-anywhere Did you solve your problem ? Thank you Dominique Le 29/01/13 09:28, SivaKarthik a écrit : Hi, i resolved the issue

Re: Crawl Anywhere -

2013-05-22 Thread Dominique Bejean
Hi, Crawl-Anywhere includes a customizable document processing pipeline. Crawl-Anywhere can also cache original crawled pages and documents in a mongodb database. Best regards. Dominique Le 11/02/13 06:16, SivaKarthik a écrit : Dear Erick, Thanks for ur relpy.. ya..nutch can meet

Re: [ANNOUNCE] Web Crawler

2013-05-23 Thread Dominique Bejean
of these required software ? Is there updated installation guide available ? Thanks Rajesh On Wed, May 22, 2013 at 6:48 PM, Dominique Bejean dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr wrote: Hi, Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere

Re: Solr 4.3.0 - SolrCloud lost all documents when leaders got rebuilt

2013-07-24 Thread Dominique Bejean
With 6 zookeeper instances you need at least 4 instances running at the same time. How can you decide to stop 4 instances and have only 2 instances running ? Zookeeper can't work anymore in these conditions. Dominique Le 25 juil. 2013 à 00:16, Joshi, Shital shital.jo...@gs.com a écrit : We

Accent insensitive multi-words suggester

2013-10-01 Thread Dominique Bejean
Hi, Up to now, the best solution I found in order to implement a multi-words suggester was to use ShingleFilterFactory filter at index time and the termsComponent. At index time the analyzer was : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/

Re: Accent insensitive multi-words suggester

2013-10-08 Thread Dominique Bejean
send something to the suggester you send just eco or éco you fold them to eco too and get back these tokens. Then the app layer breaks them up and displays them pleasingly. Best Erick On Tue, Oct 1, 2013 at 5:45 PM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, Up to now, the best

Re: Solr server becomes non-responsive.

2014-12-23 Thread Dominique Bejean
Hi, I agree Erick it could be a good think to have more details about your configuration and collection. Your heap size is 32Gb. How many RAM on each servers ? By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a collection with 4 shards ? So, how many nodes in the cluster ? How

Re: Solr Cloud and relative paths in solrconfig.xml lib directives

2014-12-23 Thread Dominique Bejean
Hi, I use to put all dependency jar files (dih, adbc driver, …) in a lib directory in the solr home directory where your shard are created. something like this solr/ solr.xml cloudcollection1_shard2_replica2/ lib/ In solrconfig.xml, I remove all the lib … directives except this

Re: Solr server becomes non-responsive.

2014-12-23 Thread Dominique Bejean
in solrconfig.xml ? Filter cache, query result cache and document cache are enabled. Auto-warming is also done. Can you provide all other JVM parameters ? -Xms20g -Xmx24g -XX:+UseConcMarkSweepGC Thanks again, Modassar On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean dominique.bej...@eolya.fr wrote

Re: Solr server becomes non-responsive.

2014-12-24 Thread Dominique Bejean
And you didn’t give how many RAM on each servers ? 2014-12-24 8:17 GMT+01:00 Dominique Bejean dominique.bej...@eolya.fr: Modassar, How many items in the collection ? I mean how many documents per collection ? 1 million, 10 millions, …? How are configured cache in solrconfig.xml ? What

Re: Core deletion

2015-01-19 Thread Dominique Bejean
for core inytapdf0 Philippe - Mail original - De: Dominique Bejean dominique.bej...@eolya.fr À: solr-user@lucene.apache.org Envoyé: Jeudi 15 Janvier 2015 11:46:43 Objet: Re: Core deletion Hi, Is there something in solr logs at startup that can explain the deletion ? How were

Re: Core deletion

2015-01-15 Thread Dominique Bejean
Hi, Is there something in solr logs at startup that can explain the deletion ? How were created the cores ? using cores API ? Dominique http://www.eolya.fr 2015-01-14 17:43 GMT+01:00 phi...@free.fr: Hello, I am running SOLR 4.10.0 on Tomcat 8. The solr.xml file in

Solrcloud sizing

2015-02-17 Thread Dominique Bejean
One of our customers needs to index 15 billions document in a collection. As this volume is not usual for me, I need some advices about solrcloud sizing (how much servers, nodes, shards, replicas, memory, ...) Some inputs : - Collection size : 15 billions document - Collection update : 8

Re: Solrcloud sizing

2015-02-17 Thread Dominique Bejean
, Feb 17, 2015 at 4:40 PM, Dominique Bejean dominique.bej...@eolya.fr wrote: One of our customers needs to index 15 billions document in a collection. As this volume is not usual for me, I need some advices about solrcloud sizing (how much servers, nodes, shards, replicas, memory

Re: Solrcloud sizing

2015-02-18 Thread Dominique Bejean
, last week, week before, ...) Regards Dominique 2015-02-18 10:35 GMT+01:00 Toke Eskildsen t...@statsbiblioteket.dk: On Wed, 2015-02-18 at 01:40 +0100, Dominique Bejean wrote: (I reordered the requirements) - Collection size : 15 billions document - Document size is nearly 300 bytes - 1

Re: How to place whole indexed data on cache

2015-02-18 Thread Dominique Bejean
Hi, As Shawn said, install enough memory in order that all free direct memory (non heap memory) be used as disk cache. Use 40% maximum of the available memory for heap memory (Xmx JVM parameter), but never more than 32 Gb And avoid your server to swap. For most Linux systems, this is configured

Re: Solrcloud, no puts anymore and tons of “update?update.distrib=TOLEADER”

2015-02-18 Thread Dominique Bejean
Hi, When you say I renamed some cores, cleaned other unused ones that we don't need anymore etc, how did you do this ? With Cores or Collections API or by deleting core's directories in Solr Home ? Dominique http://www.eolya.fr 2015-02-18 17:04 GMT+01:00 Abdelali AHBIB alifar...@gmail.com:

Solrcloud with map-reduce indexing and document routing

2015-02-18 Thread Dominique Bejean
Hi, I never used map-reduce indexing. My understanding is that map-reduce tasks generate one or more Solr indices, then the golive tool is used in order to merge these indices at core level to one or more shards (the shard's leaders) in a Solrcloud collection. After merge occurs in leaders the

Solr startup script in version 4.10.3

2015-01-06 Thread Dominique Bejean
Hi, In release 4.10.3, the following lines were removed from solr starting script (bin/solr) # TODO: see SOLR-3619, need to support server or example # depending on the version of Solr if [ -e $SOLR_TIP/server/start.jar ]; then DEFAULT_SERVER_DIR=$SOLR_TIP/server else

Re: Vertical search Engine

2015-01-06 Thread Dominique Bejean
Hi, You can have a look at www.crawl-anywhere.com A web crawler on top of Solr. Used for following vertical search engines : http://www.hurisearch.org/ http://www.searchamnesty.org/ Regards Dominique 2015-01-06 15:22 GMT+01:00 Ahmet Arslan iori...@yahoo.com.invalid: Hi,

Re: Solr startup script in version 4.10.3

2015-01-13 Thread Dominique Bejean
and is still work in progress but should give you more information. Hope that helps. On Tue, Jan 6, 2015 at 1:29 AM, Dominique Bejean dominique.bej...@eolya.fr javascript:; wrote: Hi, In release 4.10.3, the following lines were removed from solr starting script (bin/solr) # TODO

Re: Facet pivot sorting while combining Stats Component With Pivots in Solr 5

2015-03-13 Thread Dominique Bejean
Thank you for the response This is something Heliosearch can do. Ionic Seeley, created a JIRA ticket to back port this feature to Solr 5. https://issues.apache.org/jira/browse/SOLR-7214 But in order to be available in Solr 5 this ticket should cover both http://heliosearch.org/json-facet-api/

Facet pivot sorting while combining Stats Component With Pivots in Solr 5

2015-03-13 Thread Dominique Bejean
Hi, Here is a query with a sample result set. http://localhost:8983/solr/myindex/select?q=*%3A*wt=jsonindent=truestats=truestats.field={!tag=piv1}sizefacet=truefacet.limit=10facet.pivot={!stats=piv1}objectrows=0 facet_counts:{ facet_queries:{}, facet_fields:{}, facet_dates:{},

solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Dominique Bejean
Hi, Is it normal with Solr 4.10.3 that the data directory of replicas still contains directories like index.3636365667474747 index.999080980976 and files index.properties replica.properties If yes, why and in which circumstances ? Regards Dominique

Test of MapReduceIndexerTool with Solr 5.0.0 and Hadoop 2.6.0

2015-03-23 Thread Dominique Bejean
Hi, I try to adapt Mark Miller's solr-map-reduce-example scripts in order to try to use MapReduceIndexerTool with Solr 5.0.0 and Hadoop 2.6.0. I use the same twitter sample data with the same avro configuration, ... I had to change the set-map-reduce-classpath.sh file provided with Solr 5 under

Re: solr 4.10.3 and index.xxxxxxxxxxx directory

2015-04-01 Thread Dominique Bejean
commits (after DIH import) and when nodes restart. So, I will have more precise log messages tomorrow. Thank you for your response. Dominique 2015-04-01 18:29 GMT+02:00 Shawn Heisey apa...@elyograg.org: On 4/1/2015 6:35 AM, Dominique Bejean wrote: Is it normal with Solr 4.10.3 that the data

Sold 5.3 with Basic Authentication in a not SolrCloud environment

2015-09-16 Thread Dominique Bejean
Hi, The wiki explains how to upload the security.json file to Zk ( https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins ). However, is it possible to use authentication and authorization plugin in a not SolrCloud environment ? If yes, where has to be located

List current active Solr settings

2016-07-31 Thread Dominique Bejean
Hi, Is there a way to list all the current solr settings ? Something similar to the MySQL « show variables » command ? For instance, if I configure the « transientCacheSize » parameter in solr.xml file, how to be sure this setting was took into account ? Regards Dominique

Stemming and accents

2017-02-10 Thread Dominique Bejean
Hi, Is the SnowballPorterFilter sensitive to the accents for French for instance ? If I use both SnowballPorterFilter and ASCIIFoldingFilter, do I have to configure ASCIIFoldingFilter after SnowballPorterFilter ? Regards. Dominique -- Dominique Béjean 06 08 46 12 43

Re: Stemming and accents

2017-02-11 Thread Dominique Bejean
06457315001053 > > Ahmet > > > > On Friday, February 10, 2017 11:27 AM, Dominique Bejean < > dominique.bej...@eolya.fr> wrote: > Hi, > > Is the SnowballPorterFilter sensitive to the accents for French for > instance ? > > If I use both SnowballPorterFilte

Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
is? > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Apr 18, 2017 at 6:33 AM, Dominique Bejean < > dominique.bej...@eolya.fr > > wrote: > > > Hi, > > > > I don not understand what I am doing wrong il this

Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
"id": "book2", "id_book_s": "book2", "review_dt": "1994-03-15T12:00:00Z" }, { "title_s": "Friends", "pubyear_i": 1994, "stars_i": 4,

Re: Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
collection). Dominique Le mar. 18 avr. 2017 à 15:28, Dominique Bejean <dominique.bej...@eolya.fr> a écrit : > Hi, > > I reply to myself > > I just had to invert the "on" clause to make it work > > curl --data-urlencode 'expr=innerJoin(

Innerjoin streaming expressions - Invalid JoinStream error

2017-04-18 Thread Dominique Bejean
Hi, I don not understand what I am doing wrong il this simple query. curl --data-urlencode 'expr=innerJoin( search(books, q="*:*", fl="id", sort="id asc"),

Re: JVM GC Issue

2017-12-02 Thread Dominique Bejean
10,000 then that return > packet is obviously 1,000 times as large and must be assembled in > memory. > > I rather doubt the phonetic filter is to blame. But you can test this > by just omitting the field containing the phonetic filter in the > search query. I've certainly been wrong befor

Re: Solr JVM best pratices

2017-12-04 Thread Dominique Bejean
Thank you Shaw for replying each items I start to figure out better all these tricky jvm stuff. Dominique Le dim. 3 déc. 2017 à 01:30, Shawn Heisey <apa...@elyograg.org> a écrit : > On 12/2/2017 8:43 AM, Dominique Bejean wrote: > > I would like to have some advices on best pr

Solr JVM best pratices

2017-12-02 Thread Dominique Bejean
Hi, I would like to have some advices on best practices related to Heap Size, MMap, direct memory, GC algorithm and OS Swap. This is a waste subject and sorry for this long question but all these items are linked in order to have a stable Solr environment. My understanding and questions. About

Re: JVM GC Issue

2017-12-02 Thread Dominique Bejean
:NON^203+size_facet_boost_exact:"velo"^299+size_facet_boost:velo^296+size_facet_relative_boost:velo^292+marque_boost_exact:"velo"^359+marque_boost:velo^356+marque_relative_boost:velo^352+=velo=200=velo=edismax=textSearch=true=1=true=json=EUR_0_price_decimal=sort_EUR_0_special_p

Re: Solr JVM best pratices

2017-12-02 Thread Dominique Bejean
; This has been solid in production with a 32 node Solr Cloud cluster. We do > not do faceting. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Dec 2, 2017, at 7:43 AM, Dominique Bejean <dominique.bej...@eol

JVM GC Issue

2017-12-01 Thread Dominique Bejean
Hi, We are encountering issue with GC. Randomly nearly once a day there are consecutive full GC with no memory reclaimed. So the old generation heap usage grow up to the limit. Solr stop responding and we need to force restart. We are using Solr 6.6.1 with Oracle 1.8 JVM. The JVM settings

Re: JVM GC Issue

2017-12-01 Thread Dominique Bejean
ssure by roughly the same amount so it's a win in this > situation. > > Have you attached a memory profiler to the running Solr instance? I'd > be curious where the memory is being allocated. > > Best, > Erick > > On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <t..

Re: Howto disable PrintGCTimeStamps in Solr

2018-05-07 Thread Dominique Bejean
Hi, Which version of Solr are you using ? Regards Dominique Le ven. 4 mai 2018 à 09:13, Bernd Fehling a écrit : > Hi list, > > this sounds simple but I can't disable PrintGCTimeStamps in solr_gc > logging. > I tried with GC_LOG_OPTS in start scripts and

Removed nodes still visible as gone in Solrcloud graph

2018-05-29 Thread Dominique Bejean
Hi, On a node, I accidentally changed the SOLR_HOST value from uppercase to lowercase and I restarted the node. After I fixed the error, I restarted again the node but the node name in lowercase is still visible as "gone". How to definitively remove a gone node from the Solrcloud graph ?

Re: Removed nodes still visible as gone in Solrcloud graph

2018-05-29 Thread Dominique Bejean
to ZK # server/scripts/cloud-scripts/zkcli.sh -z "xxx.xxx.xxx.xxx:2181" -cmd putfile /collections/xx/state.json /tmp/-state-local.json - Start all Solr nodes Dominique Le mar. 29 mai 2018 à 14:19, Dominique Bejean a écrit : > Hi, > > On a node, I accide

Multi words query time synonyms

2018-02-09 Thread Dominique Bejean
Hi, I am trying multi words query time synonyms with Solr 6.6.2and SynonymGraphFilterFactory filter as explain in this article https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/ My field type is :

Re: Multi words query time synonyms

2018-02-11 Thread Dominique Bejean
text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de > +name_text_gp:marseil) name_text_gp:om))) > > (btw my stop list only has “de” on it) > > Thanks, > > -- > Steve > www.lucidworks.com > > > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <domin

Re: Multi words query time synonyms

2018-02-11 Thread Dominique Bejean
t; "parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil~1)", The query result are the same for all queries. It looks like this could be an acceptable workaround. Thank you Dominique Le dim. 11 févr. 2018 à 10:31, Dominiqu

Re: Multi words query time synonyms

2018-02-10 Thread Dominique Bejean
) olympiqu om marseil maillot So, i suspect an issue with edismax query parser. Regards. Dominique Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <dominique.bej...@eolya.fr> a écrit : > Hi, > > I am trying multi words query time synonyms with Solr 6.6.2and > SynonymGraphFi

Re: SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

2018-08-27 Thread Dominique Bejean
Hi, We also experimenting time-out issues from time to time. I sent this message one month ago, by mistake in the dev list. Why use hardcoded values just in ZkClientClusterStateProvider.java file while there are existing parameters for these time-out ? Regards Dominique

Re: Silk from LucidWorks

2018-07-16 Thread Dominique Bejean
Hi, Use Grafana with Solr starting version 7 si very easy and well documented. https://lucene.apache.org/solr/guide/7_3/monitoring-solr-with-prometheus-and-grafana.html Dominique Le lun. 16 juil. 2018 à 06:56, Aroop Ganguly a écrit : > How do you use Grafana with Solr ? Did you build a http

Solr and ZK timeout issues

2018-07-17 Thread Dominique Bejean
Hi, We are experimenting an issue related to Zk Timeout Stacktrace is : ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:67 - Erreur dans l'attente de la fin de l'exécution d'un thread ERROR 19 juin 2018 06:24:07,152 - h.concurrent.ConcurrentService:68 -

Re: What are descent disk I/O for Solr and Zookeeper ?

2018-03-11 Thread Dominique Bejean
”. Regards Dominique Le ven. 9 mars 2018 à 00:40, Shawn Heisey <apa...@elyograg.org> a écrit : > On 3/8/2018 2:55 PM, Dominique Bejean wrote: > > Disk I/O are critical for high performance Solrcloud. > > This statement has truth to it, but if your system is correctly size

What are descent disk I/O for Solr and Zookeeper ?

2018-03-08 Thread Dominique Bejean
Hi, Disk I/O are critical for high performance Solrcloud. I am looking for relevante disk I/O tests for both Solr node or Zookeeper element and with these tests what are bad, correct or good results. For instance how to know if these results with basic dd utility reports correct disk

Re: Index size issue in SOLR-6.5.1

2018-10-08 Thread Dominique Bejean
HI, In the Solr Admin console, you can access for each core to the "Segment info" page. You can see if there are more deleted documents in segments on server X. Dominique Le lun. 8 oct. 2018 à 07:29, SOLR4189 a écrit : > About which details do you ask? Yesterday we restarted all our solr >

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-12 Thread Dominique Bejean
Hi, 1/ As previously said by other persons, my first action would be to understand why you need so much heap ? The first step is to maximize your heap size to 31Gb (or obviously less if possible). https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ Can you

Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread Dominique Bejean
Hi, What about cores segment details in admin UI interface ? More deleted documents ? Regards Dominique Le dim. 7 oct. 2018 à 08:22, SOLR4189 a écrit : > Hi all, > > We use SOLR-6.5.1 and we have very strange issue. In our collection index > size is very different from server to server

Re: RuleBasedAuthorizationPlugin configuration

2018-12-31 Thread Dominique Bejean
in Solr standalone mode, only authentication is fully fonctionnal, not authorization ! Regards. Dominique Le dim. 30 déc. 2018 à 13:40, Dominique Bejean a écrit : > Hi, > > After reading more carefully the log file, here is my understanding. > > The request > > http://2:

Re: RuleBasedAuthorizationPlugin configuration

2018-12-30 Thread Dominique Bejean
? Regards Dominique Le ven. 21 déc. 2018 à 10:46, Dominique Bejean a écrit : > Hi, > > I am trying to configure security.json file, in order to define the > following users and permissions : > >- user "admin" with all permissions on all collections >- u

Re: RuleBasedAuthorizationPlugin configuration

2019-01-01 Thread Dominique Bejean
Hi, I created a Jira issue https://issues.apache.org/jira/browse/SOLR-13097 Regards. Dominique Le lun. 31 déc. 2018 à 11:26, Dominique Bejean a écrit : > Hi, > > In debugging mode, I discovered that only in SolrCloud mode the collection > name is extract from the request path

RuleBasedAuthorizationPlugin configuration

2018-12-21 Thread Dominique Bejean
Hi, I am trying to configure security.json file, in order to define the following users and permissions : - user "admin" with all permissions on all collections - user "read" with read permissions on all collections - user "1" with only read permissions on biblio collection -

Re: ZooKeeper for Solr 7.6

2018-12-21 Thread Dominique Bejean
Hi, This is a Solr side issue not a Zookeeper side issue. Zookeeper 3.4.13 is 5 monthes old version so you can use it on server side with the zookeeper client 3.4.11 provided by Solr. Dominique Le jeu. 20 déc. 2018 à 01:53, Yasufumi Mizoguchi a écrit : > Hi, > > I searched JIRA and found

Re: Zookeeper timeout issue -

2018-12-21 Thread Dominique Bejean
Hi, What is the scenario ? High query activity ? High update activity ? Regards. Dominique Le mer. 19 déc. 2018 à 13:44, AshB a écrit : > Hi, > > We are facing issue with solr/zookeeper where zookeeper timeouts after > 1ms. Error below. > > *SolrException:

Re: Is there a common tool for SOLR benckmark?

2018-12-21 Thread Dominique Bejean
Hi, There are the powerfull JMeter obviously and also SolrMeter ( https://github.com/tflobbe/solrmeter). Regards Dominique Le jeu. 20 déc. 2018 à 03:17, zhenyuan wei a écrit : > Hi all, >Is there a common tool for SOLR benckmark? YCSB is not very > suitable for SOLR. Currently, Is

Re: Docker and Solr Indexing

2018-09-12 Thread Dominique Bejean
Hi, Are you aware about issues in Java applications in Docker if java version is not 10 ? https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/ Regards. Dominique Le mer. 12 sept. 2018 à 05:42, Shawn Heisey a écrit : > On 9/11/2018 9:20 PM, solrnoobie wrote: >

Re: [ZOOKEEPER] - Error - HEAP MEMORY

2019-07-30 Thread Dominique Bejean
Hi, I don’t find any documentation about the parameter zookeeper_server_java_heaps in zoo.cfg. The way to control java heap size is either the java.env file of the zookeeper-env.sh file. In zookeeper-env.sh SERVER_JVMFLAGS="-Xmx=512m" How many RAM on your server ? Regards Dominique Le lun.

Field value different over replicas

2019-07-26 Thread Dominique Bejean
Hi, We have a date field with default set to “now”. For this field, some documents of the collection don’t have the same value in all replicas. The difference can be 3 or 4 minutes ! The collection has 1 shard and 2 NRT replicas. Solr version is 7.5. Collection is populated with DIH. Any ideas

Re: Synonym filters memory usage

2019-10-02 Thread Dominique Bejean
SynonymMaps. > >>>> > >>>> Regards > >>>> Bernd > >>>> > >>>> > >>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini: > >>>>> Hi, > >>>>> looking at the stateful nature of

solr.log explanations for update handler

2019-10-02 Thread Dominique Bejean
Hi, I don't find explanations on what are the 2 numeric values mean at the end of these log lines. Regards. Dominique 2019-09-30 09:19:17.474 INFO (qtp2051853139-9577) [c:maCollection3s3r s:shard1 r:core_node11 x:maCollection3s3r_shard1_replica_t2] o.a.s.u.p.LogUpdateProcessorFactory

Synonym filters memory usage

2019-09-29 Thread Dominique Bejean
Hi, My concern is about memory used by synonym filter, especially if synonyms resources files are large. If in my schema, there are two field types "TypeSyno1" and "TypeSyno2" using synonym filter with the same synonyms files. For each of these two field types there are two fields Field1 type

Re: NRT vs TLOG bulk indexing performances

2019-10-30 Thread Dominique Bejean
s replicating changed segments and that’s slowing down > ingestion? > > It’d be interesting to index to NRT, leader-only and also a single TLOG > collection. > > > Best, > Erick > > > On Oct 25, 2019, at 8:28 AM, Dominique Bejean > wrote: > > > > Shawn

Re: When does Solr write in Zookeeper ?

2019-11-18 Thread Dominique Bejean
thout _either_ reading or writing to ZK. > > One rather obscure cause for ZK writes is when using “schemaless” mode. > When a new field is detected, the schema (and thus the collection’s > configuration) is changed, which generates writes.. > > Best, > Erick > > > > On

When does Solr write in Zookeeper ?

2019-11-15 Thread Dominique Bejean
Hi, I would like to be certain to understand how Solr use Zookeeper and more precisely when Solr write into Zookeeper. Solr stores various informations in ZK - globale configuration (autoscaling, security.json) - collection configuration (configs) - collections state (state.json,

Re: When does Solr write in Zookeeper ?

2019-11-15 Thread Dominique Bejean
g or writing to ZK. > > One rather obscure cause for ZK writes is when using “schemaless” mode. > When a new field is detected, the schema (and thus the collection’s > configuration) is changed, which generates writes.. > > Best, > Erick > > > > On Nov 15, 2019, at 12

Re: $deleteDocByQuery and $deleteDocByID

2019-11-15 Thread Dominique Bejean
Hi Paresh, Due to deleteDocByQuery impact on commits and searcher reopen, if a lot of deletions are done it is preferable when possible to use deletebyid . Regards Dominique Le mar. 12 nov. 2019 à 07:03, Paresh a écrit : > Hi Erik, > > I am also looking for some example of deleteDocByQuery.

Re: Convert TLOG collection to NRT

2019-12-10 Thread Dominique Bejean
Thank you Shawn. You're right ! It is better to read the good version of the Collection API documentation. Le mar. 10 déc. 2019 à 19:49, Shawn Heisey a écrit : > On 12/10/2019 11:25 AM, Dominique Bejean wrote: > > I would like to convert a collection (3 shards x 3 replicas)

Convert TLOG collection to NRT

2019-12-10 Thread Dominique Bejean
Hi, I would like to convert a collection (3 shards x 3 replicas) from TLOG to NRT. The only solution I imagine is something like : * with collection API, remove replicas in order to keep only 1 replica per 3 shard * update the collection state.json in zookeer * with collection API, reload the

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
est? > > > Am 25.10.2019 um 09:16 schrieb Dominique Bejean < > dominique.bej...@eolya.fr>: > > > > Hi, > > > > I made some benchmarks for bulk indexing in order to compare performances > > and ressources usage for NRT versus TLOG replica. > > >

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
10/25/2019 1:16 AM, Dominique Bejean wrote: > > For collection created with all replicas as NRT > > > > * Indexing time : 22 minutes > > > > > For collection created with all replicas as TLOG > > > > * Indexing time : 34 minutes > > NRT indexes sim

Re: Minimum Tomcat version that supports latest Solr version

2019-10-15 Thread Dominique Bejean
Hi, Solr is not tested with Tomcat since version 4. Why not using the embedded Jetty server ? Regards Dominique Le mar. 15 oct. 2019 à 10:44, vikas shinde a écrit : > Dear Solr team, > > Which is the latest Tomcat version that supports the latest Solr version > 8.2.0? > > Also provide

NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Hi, I made some benchmarks for bulk indexing in order to compare performances and ressources usage for NRT versus TLOG replica. Environnent : * Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap) * 1 collection with 2 shards x 2 replicas (all NRT or all TLOG) * 1 core per Solr Server Indexing of

  1   2   >