Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Why didn't I thought of that. That's another alternative. Thank you for your suggestion. Appreciate it. On 10/13/2016 5:41 AM, Georg Sorst wrote: You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh

Re: London Lucene Hackday is now running

2016-10-12 Thread Charlie Hull
On the Flax blog (www.flax.co.uk/blog) eventually, but for now I've made some notes at https://github.com/flaxsearch/london-hackday-2016 . We had 20 people on Friday in London and 15 in Boston on Tuesday, everyone seemed to enjoy themselves - and we made some real progress on a number of issues.

[Solr 5.1.0] - Ignoring Whitespaces as delimiters

2016-10-12 Thread deniz
Hello, Are there any built-in tokenizers which will do sth like StandardTokenizer, but will not tokenize on whitespace? e.g field:abc cde-rfg will be tokenized as "abc cde" and "rfg", not "abc", "cde", "rfg" I have checked the existing tokenizers/analyzers and it seems like there is no other

Re: qf boosts with MoreLikeThis query parser

2016-10-12 Thread Ere Maijala
Answering to myself.. I did some digging and found out that boosts work if qf is repeated in the local params, at least in Solr 6.2, like this: {!mlt qf=title^100 qf=author=^50}recordid However, it doesn't work properly with CloudMLTQParser used in SolrCloud mode. I'm working on a proposed

Re: How to retrieve 200K documents from Solr 4.10.2

2016-10-12 Thread Nick Vasilyev
Check out cursorMark, it should be available in your release. There is some good information on this page: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid < obaid.salik...@iacpublishinglabs.com> wrote: > Hi, > > I am using

How to retrieve 200K documents from Solr 4.10.2

2016-10-12 Thread Salikeen, Obaid
Hi, I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it has 3 nodes), and let me first state that I am new Solr. I want to retrieve all documents from Sold (essentially just one field from each document). What is the best way of fetching this much data without overloading

Re: Split words with period in between into separate tokens

2016-10-12 Thread Georg Sorst
You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh schrieb am Mi., 12. Okt. 2016 11:38: > Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The > field does has values with numbers in

questions about shard key

2016-10-12 Thread Huang, Daniel
Hi, I was reading about document routing with CompositId (https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud). The document says that I could prefix a shard key to a document ID like “IBM!12345”. It further mentioned that I could specify the number of bit

Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The field does has values with numbers in them therefore it is not applicable. Thank you. On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote: You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On

multivalued coordinate for geospatial search

2016-10-12 Thread Chris Chris
Hello solr users! I am trying to use geospatial to do some basic distance search in Solr4.10 At the moment, I got it working if I have just on set of coordinate (latitude,longitude) per document. However, I need to get it to work when I have an unknown numbers of set of coordinates per

Re: Split words with period in between ("Co.Ltd") into separate tokens

2016-10-12 Thread Derek Poh
Thank you for pointing out the flags. I set generateWordParts=1 and the term is split up. On 10/12/2016 3:26 PM, Modassar Ather wrote: Hi, The flags set in your WordDelimiterFilterFactory definition is 0. You can try with generateWordParts=1 and splitOnCaseChange=1 and see if it breaks as per

Re: Split words with period in between into separate tokens

2016-10-12 Thread Dheerendra Kulkarni
You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh wrote: > Hi > > How can I split words with period in between into separate tokens. > Eg. "Co.Ltd" => "Co" "Ltd" . > > I am using StandardTokenizerFactory

Re: Split words with period in between ("Co.Ltd") into separate tokens

2016-10-12 Thread Modassar Ather
Hi, The flags set in your WordDelimiterFilterFactory definition is 0. You can try with generateWordParts=1 and splitOnCaseChange=1 and see if it breaks as per your requirement. You can also try with other available flags enabled. Best, Modassar On Wed, Oct 12, 2016 at 12:44 PM, Derek Poh

Re: Split words with period in between ("Co.Ltd") into separate tokens

2016-10-12 Thread Derek Poh
I tried adding Word Delimiter Filter to the field but it does not process or it truncate away the term "Co.Ltd". generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> On 10/12/2016 8:54 AM, Derek Poh wrote: Hi How can I split words with

Re: Re: Config for massive inserts into Solr master

2016-10-12 Thread Reinhard Budenstecher
> That is not correct as of version 4.0. > > The only kind of update I've run into that cannot proceed at the same > time as an optimize is a deleteByQuery operation. If you do that, then > it will block until the optimize is done, and I think it will also block > > any update you do after it. >

Re: Config for massive inserts into Solr master

2016-10-12 Thread Shawn Heisey
On 10/12/2016 12:18 AM, Reinhard Budenstecher wrote: > Is my assumption correct that an OPTIMIZE of index would block all > inserts? So that all processes have to pause when I will start an > hour-running OPTIMIZE? If so, this would also be no option for the moment. That is not correct as of

Re: Predicting query execution time.

2016-10-12 Thread Modassar Ather
Thanks Shawn for your suggestions. Best, Modassar On Wed, Oct 12, 2016 at 11:44 AM, Shawn Heisey wrote: > On 10/11/2016 11:46 PM, Modassar Ather wrote: > > We see queries executing in less than a second and taking minutes to > > execute as well. We need to predict the

Re: Re: Config for massive inserts into Solr master

2016-10-12 Thread Reinhard Budenstecher
> > That's considerably larger than you initially indicated. In just one > index, you've got almost 300 million docs taking up well over 200GB. > About half of them have been deleted, but they are still there. Those > deleted docs *DO* affect operation and memory usage. > > Getting rid of

Re: Predicting query execution time.

2016-10-12 Thread Shawn Heisey
On 10/11/2016 11:46 PM, Modassar Ather wrote: > We see queries executing in less than a second and taking minutes to > execute as well. We need to predict the approximate time a query might > take to execute. Need your help in finding the factors to be > considered and calculating an approximate