Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Zheng Lin Edwin Yeo
Thanks for your help, Joel and Susheel. Regards, Edwin On 6 July 2017 at 05:49, Susheel Kumar wrote: > Hello Joel, > > Opened the ticket > > https://issues.apache.org/jira/browse/SOLR-11017 > > Thanks, > Susheel > > On Wed, Jul 5, 2017 at 2:46 PM, Joel Bernstein

Placing different collections on different hard disk/folder

2017-07-05 Thread Zheng Lin Edwin Yeo
Hi, Would like to check, how can we place the indexed files of different collections on different hard disk/folder, but they are in the same node? For example, I want collection1 to be placed in C: drive, collection2 to be placed in D: drive, and collection3 to be placed in E: drive. I am using

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
Hi erik. What i want to said is that we have enough memory to store shards, and furthermore, JVMs heapspaces Machine has 400gb of RAM. I think we have enough. We have 10 JVM running on the machine, each of one using 16gb. Shard size is about 8gb. When we have query or indexing peaks our

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Susheel Kumar
Hello Joel, Opened the ticket https://issues.apache.org/jira/browse/SOLR-11017 Thanks, Susheel On Wed, Jul 5, 2017 at 2:46 PM, Joel Bernstein wrote: > There are a number of functions that are currently being held up because of > conflicting duplicate function names. We

Re: Allow Join over two sharded collection

2017-07-05 Thread Susheel Kumar
How are you planing to manual route? What key(s) are you thinking to use. Second the link i shared was collection aliasing and if you use that, you will end up with multiple collections. Just want to clarify as you said above "...manual routing and creating alias" Again until the join feature is

Re: solr alias not working on streaming query search

2017-07-05 Thread Joel Bernstein
This should be fixed in Solr 6.4: https://issues.apache.org/jira/browse/SOLR-9077 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jul 5, 2017 at 2:40 PM, Lewin Joy (TMS) wrote: > ** PROTECTED 関係者外秘 > > Have anyone faced a similar issue? > > I have a collection named

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Joel Bernstein
There are a number of functions that are currently being held up because of conflicting duplicate function names. We haven't come to an agreement yet on the best way forward for this yet. I think we should open a separate ticket to discuss how best to handle this issue. Joel Bernstein

solr alias not working on streaming query search

2017-07-05 Thread Lewin Joy (TMS)
** PROTECTED 関係者外秘 Have anyone faced a similar issue? I have a collection named “solr_test”. I created an alias to it as “solr_alias”. This alias works well when I do a simple search: http://localhost:8983/solr/solr_alias/select?indent=on=*:*=json But, this will not work when used in a

Best way to split text

2017-07-05 Thread tstusr
We are working on a search application for large pdfs (~ 10 - 100 Mb), there are been correctly indexed. However we want to make some training in the pipeline, so we are implementing some spark mllib algorithms. But now, some requirements are to split documents into either paragraphs or pages.

Re: High disk write usage

2017-07-05 Thread Erick Erickson
bq: We have enough physical RAM to store full collection and 16Gb for each JVM. That's not quite what I was asking for. Lucene uses MMapDirectory to map part of the index into the OS memory space. If you've over-allocated the JVM space relative to your physical memory that space can start

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
Hi Erik! thanks for your response! Our soft commit is 5 seconds. Why generates I/0 a softcommit? first notice. We have enough physical RAM to store full collection and 16Gb for each JVM. The collection is relatively small. I've tried (for testing purposes) disabling transactionlog

Re: help on implicit routing

2017-07-05 Thread Erick Erickson
Use the _route_ field and put in "day_1" or "day_2". You've presumably named the shards (the "shard" parameter) when you added them with the CREATESHARD command so use the value you specified there. Best, Erick On Wed, Jul 5, 2017 at 6:15 PM, wrote: > I am trying out the

Re: Optimization/Merging space

2017-07-05 Thread Erick Erickson
Bad Things Can Happen. Solr (well, Lucene in this case) tries very hard to keep disk full operations from having repercussions., but it's kind of like OOMs. What happens next? It's not so much the merge/optimize, but what happens in the future when the _next_ segment is written... The merge or

Re: Solr dynamic "on the fly fields"

2017-07-05 Thread Erick Erickson
Some aggregations are supported by combining stats with pivot facets? See: https://lucidworks.com/2015/01/29/you-got-stats-in-my-facets/ Don't quite think that works for your use case though. the other thing that _might_ help is all the Streaming Expression/Streaming Aggregation work. Best,

Optimization/Merging space

2017-07-05 Thread David Hastings
Hi all, I am curious to know what happens when solr begins a merge/optimize operation, but then runs out of physical disk space. I havent had the chance to try this out yet but I was wondering if anyone knows what the underlying codes response to the situation would be if it happened. Thanks

Re: High disk write usage

2017-07-05 Thread Erick Erickson
What is your soft commit interval? That'll cause I/O as well. How much physical RAM and how much is dedicated to _all_ the JVMs on a machine? One cause here is that Lucene uses MMapDirectory which can be starved for OS memory if you use too much JVM, my rule of thumb is that _at least_ half of

Re: index new discovered fileds of different types

2017-07-05 Thread Erick Erickson
I really have no idea what "to ignore the prefix and check of the type" means. When? How? Can you give an example of inputs and outputs? You might want to review: https://wiki.apache.org/solr/UsingMailingLists And to add to what Furkan mentioned, in addition to schemaless you can use "managed

Re: index new discovered fileds of different types

2017-07-05 Thread Thaer Sammar
Hi Furkan, No, In the schema we also defined some static fields such as uri and geo field. On 5 July 2017 at 17:07, Furkan KAMACI wrote: > Hi Thaer, > > Do you use schemeless mode [1] ? > > Kind Regards, > Furkan KAMACI > > [1]

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
thanks Markus! We already have SSD. About changing topology we probed yesterday with 10 shards, but system goes more inconsistent than with the current topology (5x10). I dont know why... too many traffic perhaps? About merge factor.. we set default configuration for some days... but when a

Re: index new discovered fileds of different types

2017-07-05 Thread Furkan KAMACI
Hi Thaer, Do you use schemeless mode [1] ? Kind Regards, Furkan KAMACI [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar wrote: > Hi, > We are trying to index documents of different types. Document have >

RE: High disk write usage

2017-07-05 Thread Markus Jelsma
Try mergeFactor of 10 (default) which should be fine in most cases. If you got an extreme case, either create more shards and consider better hardware (SSD's) -Original message- > From:Antonio De Miguel > Sent: Wednesday 5th July 2017 16:48 > To:

Re: High disk write usage

2017-07-05 Thread Antonio De Miguel
Thnaks a lot alessandro! Yes, we have very big physical dedicated machines, with a topology of 5 shards and10 replicas each shard. 1. transaction log files are increasing but not with this rate 2. we 've probed with values between 300 and 2000 MB... without any visible results 3. We don't

Re: cursorMark / Deep Paging and SEO

2017-07-05 Thread Shawn Heisey
On 6/30/2017 1:30 AM, Jacques du Rand wrote: > I'm not quite sure I understand the deep paging / cursorMark internals > > We have implemented it on our search pages like so: > > http://mysite.com/search?foobar=1 > http://mysite.com/search?foobar=2=djkldskljsdsa >

Re: Unique() metrics not supported in Solr Streaming facet stream source

2017-07-05 Thread Susheel Kumar
Does "uniq" expression sounds good to use for UniqueMetric class? Thanks, Susheel On Tue, Jul 4, 2017 at 5:45 PM, Susheel Kumar wrote: > Hello Joel, > > I tried to create a patch to add UniqueMetric and it works, but soon > realized, we have UniqueStream as well and

Re: High disk write usage

2017-07-05 Thread alessandro.benedetti
Point 2 was the ram Buffer size : *ramBufferSizeMB* sets the amount of RAM that may be used by Lucene indexing for buffering added documents and deletions before they are flushed to the Directory. maxBufferedDocs sets a limit on the number of documents buffered

Re: High disk write usage

2017-07-05 Thread alessandro.benedetti
Is the phisical machine dedicated ? Is a dedicated VM on shared metal ? Apart from this operational checks I will assume the machine is dedicated. In Solr a write to the disk does not happen only on commit, I can think to other scenarios : 1) *Transaction log* [1] 2) 3) Spellcheck

Re: Solr dynamic "on the fly fields"

2017-07-05 Thread Pablo Anzorena
Thanks Erick for the answer. Function Queries are great, but for my use case what I really do is making aggregations (using Json Facet for example) with this functions. I have tried using Function Queries with Json Facet but it does not support it. Any other idea you can imagine? 2017-07-03

index new discovered fileds of different types

2017-07-05 Thread Thaer Sammar
Hi, We are trying to index documents of different types. Document have different fields. fields are known at indexing time. We run a query on a database and we index what comes using query variables as field names in solr. Our current solution: we use dynamic fields with prefix, for example

help on implicit routing

2017-07-05 Thread imran
I am trying out the document routing feature in Solr 6.4.1. I am unable to comprehend the documentation where it states that “The 'implicit' router does not automatically route documents to different shards. Whichever shard you indicate on the indexing request (or within each document) will be

Re: Solr Prod Issue | KeeperErrorCode = ConnectionLoss for /overseer_elect/leader

2017-07-05 Thread Ere Maijala
From the fact that someone has tried to access /etc/passwd file via your Solr (see all those WARN messages), it seems you have it exposed to the world, unless of course it's a security scanner you use internally. Internet is a hostile place, and the very first thing I would do is shield Solr

RE: Solr Prod Issue | KeeperErrorCode = ConnectionLoss for /overseer_elect/leader

2017-07-05 Thread Bhalla, Rahat
Hi I'm not sure if any of you have had a chance to see this email yet. We had a reoccurrence of the Issue Today, and I'm attaching the Logs from today as well inline below. Please let me know if any of you have seen this issue before as this would really help me to get to the root of the

High disk write usage

2017-07-05 Thread Antonio De Miguel
Hi, We are implementing a solrcloud cluster (6.6 version) with NRT requisites. We are indexing 600 docs/sec with 1500 docs/sec peaks, and we are serving about 1500qps. Our documents has 300 fields with some doc values, about 4kb and we have 3 million of documents. HardCommit is set to 15

Re: Did /export use to emit tuples and now does not?

2017-07-05 Thread Ronald Wood
Thanks, Joel. I just wanted to confirm, as I was having trouble tracking down when the change occurred. -R On 04/07/2017, 23:51, "Joel Bernstein" wrote: In the very early releases (5x) the /export handler had a different format then the /search handler. Later the

Re: Strange boolean query behaviour on 5.5.4

2017-07-05 Thread Bram Van Dam
On 04/07/17 18:10, Erick Erickson wrote: > I think you'll get what you expect by something like: > (*:* -someField:Foo) AND (otherField: (Bar OR Baz)) Yeah that's what I figured. It's not a big deal since we generate Solr syntax using a parser/generator on top of our own query syntax. Still a

Re: xml indexing

2017-07-05 Thread txlap786
Thanks for your reply, but it only works when i got no response. But as i said im working on arrays. As soon as i get an array it doesnt matter if array's length is 1 or 105 it returns what i get earlier. #1 json response "detailComment", [ "100.01", null, "102.01", null ] return