Hi,
Can you please provide me the privilege to edit Wiki pages.
My Wiki username is Dikshant.
Thanks,
Dikshant
I added you to the Solr Wiki, if you need Lucene Wiki access let us know.
Erick
On Wed, Jul 15, 2015 at 7:59 PM, Dikshant Shahi contacts...@gmail.com wrote:
Hi,
Can you please provide me the privilege to edit Wiki pages.
My Wiki username is Dikshant.
Thanks,
Dikshant
Thanks Erick! This is good for now.
On Thu, Jul 16, 2015 at 9:54 AM, Erick Erickson erickerick...@gmail.com
wrote:
I added you to the Solr Wiki, if you need Lucene Wiki access let us know.
Erick
On Wed, Jul 15, 2015 at 7:59 PM, Dikshant Shahi contacts...@gmail.com
wrote:
Hi,
Can you
As you've seen RankQueries won't currently have any effect on Grouping
queries.
A RankQuery can be combined with Collapse and Expand though. You may want
to review Collapse and Expand and see if it meets your use case.
Joel Bernstein
http://joelsolr.blogspot.com/
On Wed, Jul 15, 2015 at 2:36
I'm doing some testing on long running huge indexes.
Therefore I need a clean state after some days running.
My idea was to open a new searcher with commit command:
INFO - org.apache.solr.update.DirectUpdateHandler2;
start
What do you mean with clean state? A searcher is a view over a given
index (let's say) state...if the state didn't change why do you want
another (identical) view?
On 15 Jul 2015 02:30, Bernd Fehling bernd.fehl...@uni-bielefeld.de
wrote:
I'm doing some testing on long running huge indexes.
I also feel having dataDir configurable helps deployments in enterprise
easy. Generally software are installed in root disk e.g. /opt/solr and if
the data folder is within it, then it will require root drive to be
expanded as Solr index increases or need to be optimized, etc. Having data
folder
Hi,
i'm using Solr 4.10.3, and i'm trying update a doc field using atomic update
(http://wiki.apache.org/solr/Atomic_Updates).
My schema.xml is like this:
!--Fields--
field name=id type=string indexed=true stored=true required=true /
field name=name type=string indexed=true stored=true /
field
On 14/07/2015 17:04, Erick Erickson wrote:
Well, Shawn I for one am in your corner.
Schemaless is great for getting thing running, but it's
not an AI. And it can get into trouble guessing. Say
it guesses a field should be an int because the first one
it sees is 123 but it's really a part
On top of that sorry, I didn't answer to your question because I don't know
if that is possible
Best,
Andrea
On 15 Jul 2015 02:51, Andrea Gazzarini a.gazzar...@gmail.com wrote:
What do you mean with clean state? A searcher is a view over a given
index (let's say) state...if the state didn't
HI Erick,
Thanks for pointing out the main problem of my system.
Trung.
On Fri, Jul 10, 2015 at 11:47 PM, Erick Erickson erickerick...@gmail.com
wrote:
In a word, no. If you don't store the data it is completely gone
with no chance of retrieval.
There are a couple of things to think about
I can sort the parent documents with the ScoreMode function, you can take a
look here:
http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html
Thanks Upaya for sharing. I am looking to deploy Solr in a Windows 64 Bit
Server environment. Some people do say Jetty works optimally in a Linux based
environment. Having said that, I believe Solr will have improved it's stability
within a Windows environment.
I agree with you on the advice.
Use Jetty. Or rather, just use bin/solr or bin\solr.cmd to interact with
Solr.
In the past, Solr shipped as a war which could be deployed in any
servlet container. Since 5.0, it is to be considered a self-contained
application, that just happens to use Jetty underneath.
If you used something
Hi Adrian,
since version 5.0 Solr is shipped with Jetty. But I think it could be a
more interesting question is understand if default Jetty configuration
could be used as is in a production environment.
On Wed, Jul 15, 2015 at 8:43 AM, Adrian Liew adrian.l...@avanade.com
wrote:
Hi all,
Will
What are your cache sizes? Max doc?
Also, what GC settings are you using? 6GB isn't all that much for a
memory-intensive app like Solr, esp. given the number of facet fields
you have. Lastly, are you using docvalues for your facet fields? That
should help reduce the amount of heap needed to
Hello,
I've ran into quite the snag and I'm wondering if anyone can help me out
here. So the situation.
I am using the DataImportHandler to pull from a database and a Linux file
system. The database has the metadata. The file system the document text. I
thought it had indexed all the files I
I have handler configured in solr.config as shard.tolerant = true , which
means ignore unavailable shards when returning a results .
Sometime shards are not really down,but doing GC or heavy commit .
Is it possible and how to ignore them ? I prefer to get a partial result
instead of timeout
from here :
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
we can learn that Transaction Log is needed when replicas are used in SOLR
cloud .
Do I need it if I am not using a replicas ?
Could it be disabled for performance improvement ?
What are negative
Hi Everyone,
I need to use a RankQuery within a grouping [1].
I did some experiments with RerankQuery [2] and solr 4.10.2 and it seems
that
if you group on a field, the reranking query is completely ignored
(on the cloud, and on a single instance).
I would expect to see the results in each group
That should be author 280 and 281. Sorry
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217547.html
Sent from the Solr - User mailing list archive at Nabble.com.
Mikhail
We do add new nodes with our custom results in some cases... just curious-
does that preclude us from doing what we're trying to do above? FWIW, we
can avoid the custom nodes if we had to.
Chetan
On Wed, Jul 15, 2015 at 12:39 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Talking about performances you should take a look to the difference in
performance between :
- disjunction of K sorted arrays ( n*k*log(k)) in Lucene - where *k* are
the disjunction clauses and *n* the average posting list size (just learned
today from an expert lucene committer))
- conjunction
bq: does that preclude us from doing what we're trying to do above?
Not at all. You just have to process each response and combine them
perhaps.
In this case, you might be able to get away with just specifying the
shards parameter to the query and having the app layer deal with
the responses.
This is kinda weird and looks a lot like a bug.
Let me try to reproduce it locally!
I let you know soon !
Cheers
2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso almlo...@indra.es:
Hi,
i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
update
Just tried, on Solr 5.1 and I get the proper behaviour.
Actually where is the value for the dinamic_desc coming from ?
I can not see it in the updates and actually it is not in my index.
Are you sure you have not forgotten any detail ?
Cheers
2015-07-15 11:48 GMT+01:00 Alessandro Benedetti
Hi Everyone,
Out-of-the box, Solr (Lucene?) is set to use OR as the default Boolean
operator. Can someone tell me the advantages / disadvantages of using OR
or AND as the default?
I'm leaning toward AND as the default because the more words a user types,
the narrower the result set should be.
Triggering a commit , implies the new Searcher to be opened in a soft
commit scenario.
With an hard commit, you can decide if opening or not the new searcher.
But this is probably a X/Y problem.
Can you describe better your real problem and not the way you were trying
to solve it ?
Cheers
Ohhh!
I didn't read it completely, so i missed the copy field.
Ok now.
This is the explanation :
Copy fields are added at indexing time, when the document arrived to the
RunUpdateRequest processor.
If i remember well at this point , before we start the indexing the content
of source field is added
What ever you name a problem, I just wanted to open a new searcher
after several days of heavy load/searching on one of my slaves
to do some testing with empty field-/document-/filter-caches.
Sure, I could first add, then delete a document and do a commit.
Or may be only do a fake update of a
Going through the code in the RunUpdateRequestProcessor we call at one
point :
…
Document luceneDocument = cmd.getLuceneDocument();
// SolrCore.verbose(updateDocument,updateTerm,luceneDocument,writer);
writer.updateDocument(updateTerm, luceneDocument);
..
Inside that method we call :
public
Well yes, a simple empty commit won't do the trick, the searcher is not going
to reload on recent versions. Reloading the core will.
-Original message-
From:Bernd Fehling bernd.fehl...@uni-bielefeld.de
Sent: Wednesday 15th July 2015 13:42
To: solr-user@lucene.apache.org
Subject:
Thank you all for helping on this topic. I'm going to play with this and
might come back with more questions.
Steve
On Tue, Jul 14, 2015 at 1:57 PM, Erick Erickson erickerick...@gmail.com
wrote:
Steve:
Simplest solution:
remove WordDelimiterFilterFactory.
Use something like
OK. so effectively use the core product as it was in Solr 4, running a
schema.xml file to control doc structures and validation. In Sol 5, does
anyone have a clear link or some pointers as to the options for bin/solr
create_core to boot up the instance I need?
Thanks for all the help.
--
View
Hi Markus,
excellent, reloading the core did it.
Best regards
Bernd
Am 15.07.2015 um 13:44 schrieb Markus Jelsma:
Well yes, a simple empty commit won't do the trick, the searcher is not going
to reload on recent versions. Reloading the core will.
-Original message-
From:Bernd
2015-07-15 12:44 GMT+01:00 Markus Jelsma markus.jel...@openindex.io:
Well yes, a simple empty commit won't do the trick, the searcher is not
going to reload on recent versions. Reloading the core will.
mmm Markus, let's assume we trigger a soft commit, even empty, if open
searcher is equal
After playing with SolrCloud I answered my own question: multiple collections
can live on the same node. Following the how-to in the solr-ref-guide was
getting me confused.
--
View this message in context:
My first guess is that somehow these two documents have
the same uniqueKey as some other documents so later
docs are replacing newer docs. Although not conclusive,
looking at the admin page for the cores in question may
show numDocs=278 and maxDoc=280 or some such in
which case that would be
You were 100 percent right. I went back and checked the metadata looking for
multiple instances of the same file path. Both of the files had an extra set
of metadata with the same filepath. Thank you very much.
--
View this message in context:
On Wed, Jul 15, 2015 at 12:00 PM, Chetan Vora chetanv...@gmail.com wrote:
Mikhail
We do add new nodes with our custom results in some cases... just curious-
does that preclude us from doing what we're trying to do above? FWIW, we
can avoid the custom nodes if we had to.
If your custom
On 7/15/2015 12:42 PM, SolrUser1543 wrote:
from here :
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
we can learn that Transaction Log is needed when replicas are used in SOLR
cloud .
Do I need it if I am not using a replicas ?
Could it be disabled
bq: Do I need it if I am not using a replicas
Yes. The other function of transaction logs is
to recover documents indexed to segments
that haven't been closed in the event of
abnormal termination (i.e. somebody pulls
the plug).
Here's some info you might find useful:
Mikhail -
This worked great.
http://localhost:8983/solr/demo/select?q={!parent
which='type:parent'}image_uri_s:somevaluefl=*,[child
parentFilter=type:parent
childFilter=-type:parent]indent=true
Thank you.
--
View this message in context:
bq: now if a user types apples oranges (without quotes) will
the ranking be the same as when I had AND?
You haven't defined same. But at root I think this is a red
herring, you haven't stated why you care. They're different queries
so I think the question is really which is more or less
Sorry in advance if I am beating a dead horse here ...
Here is an article by Mark Miller that gives some background and examples:
http://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/
In particular, see the section entitled Update Alias.
Thanks Mikhail, the post is really useful!
I will study it in details.
A slight change in the syntax change the query parsed.
Anyway just tried again q=(image_uri_s:somevalue) OR (-image_uri_s:*)
query approach .
And actually it is working as expected:
q=(name:nome) OR (-name:*) ( give me all
Hi, thanks for your help!
Value for 'dinamic_desc' field come from 'src_desc' field. I copy the value
with:
copyField source=src_* dest=dinamic_*/
Seems like when I update a different field (field 'name') via atomic update,
the copyField directive copies the value again from 'src_desc' to
It is simply precision (AND) vs. recall (OR) - the former tries to limit
the total result count, while the latter tries to focus on relevancy of the
top results even if the total result count is higher.
Recall is good for discovery and browsing, where you sort of know what you
generally want, but
On 7/15/2015 3:01 AM, Martínez López, Alfonso wrote:
!--Fields--
field name=id type=string indexed=true stored=true required=true /
field name=name type=string indexed=true stored=true /
field name=src_desc type=string indexed=true stored=true /
field name=_version_ type=long indexed=true
The AND default has one big problem. If the user misspells a single word, they
get no results. About 10% of queries are misspelled, so that means a lot more
failures.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jul 15, 2015, at 7:21 AM, Jack
Hey Shawn, I was debugging a little bit,this is the problem :
When adding a field from the solr document, to the Lucene one, even if this
document was previously added by the execution of the copy field
instruction to the Lucene Document, this check is carried :
Hi all,
Will like to ask your opinion if it is recommended to use the default Jetty
servlet container as a service to run Solr on a multi-server production
environment. I hear some places that recommend using Tomcat as a servlet
container. Is anyone able to share some thoughts about this?
Sorry Erick, I completely agree with you, I didn;'t specify in details what
I was thinking :
copy fields must not be executed if the updated field is not a source
field ( in a copy field couple)
furthermore I agree again with you, copy field should try to give a
different analysis to the new
2015-07-15 16:01 GMT+01:00 Mikhail Khludnev mkhlud...@griddynamics.com:
1. I can't get your explanation.
2. childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)
is not correct, lacks of quotes , and pointless (selecting some term, and
negating all terms gives nothing).
Not considering
Charles:
bq: My understanding is that this is actually somewhat slower than
the standard indexing path...
Yes and no. If you just use a single thread, you're right it'll be
slower since it has to copy a
bunch of stuff around. Then at the end, the --go-live step copies the
built index to Solr
Hi Charles,
Thank you for the response. We will be using aliasing. Looking into ways
to avoid ingestion into each of the collections as you have mentioned For
example, would it be faster to make a file system copy of the most recent
collection ..²
MapReduceIndexerTool is not an option at this
Thank you all. Looks like OR is a better choice vs. AND.
Charles: I don't understand what you mean by the spellcheck component.
Do you mean OR works best with spell checker?
Steve
On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles
charles.reit...@tiaa-cref.org wrote:
A common approach to
bq: Is there a way to invoke IndexSearcher.search(Query, Collector)
Problem is that this question doesn't make a lot of sense to me.
IndexSearcher is, by definition, local to a single Lucene
instance. Distributed requests are a whole different beast. If you're going
to try to use custom request
Ok, thank very much.
When I try a second atomic udpate is when I got the exception you mentioned
multiple values encountered
for non multiValued copy field. First time there is not exception but the
non-multivalued field get indexed with 2 values.
Cheers.
See SOLR-5783.
-Original message-
From:Alessandro Benedetti benedetti.ale...@gmail.com
Sent: Wednesday 15th July 2015 14:48
To: solr-user@lucene.apache.org
Subject: Re: To the experts: howto force opening a new searcher?
2015-07-15 12:44 GMT+01:00 Markus Jelsma
And, to answer your other question, yes, you can turn off auto-warming.If
your instance is dedicated to this client task, it may serve no purpose or be
actually counter-productive.
In the past, I worked on a Solr-based application that committed frequently
under application control (vs.
Am 15.07.2015 um 14:47 schrieb Alessandro Benedetti:
...
What ever you name a problem, I just wanted to open a new searcher
after several days of heavy load/searching on one of my slaves
to do some testing with empty field-/document-/filter-caches.
Aren't you warming your caches on commits
Just to re-iterate Charles' response with an example, we have a system
which needs to be as Near RT as we can make it. So we have application
level commitWith set to 250ms. Yes, we have to turn off a lot of caching,
auto-warming, etc, but it was necessary to make the index as real time as
we
Since they want explicitly search within a given version of the data, this
seems like a textbook application for collection aliases.
You could have N public collection names: current_stuff, previous_stuff_1,
previous_stuff_2, ... At any given time, these will be aliased to reference
the
Alfonso:
Haven't worked with this myself, but could field aliasing handle your use-case
_without_ the need for a copyField at all?
See: https://issues.apache.org/jira/browse/SOLR-1205
Again I need to emphasize that I HAVE NOT worked with this so it may be
a really bad suggestion. Or it may not
The OP asked about MapReduceIndexerTool. My understanding is that this is
actually somewhat slower than the standard indexing path and is recommended
only if the site is already invested in the Hadoop infrastructure. E.g. input
files are already distributed on the Hadoop/Search cluster via
If you inlined the query rather than referenced the thread, it would be
easy to understand the problem.
once again, what doesn't meet your expectation: order of returned parents
or order of children attached to a parent doc?
On Wed, Jul 15, 2015 at 1:56 AM, DorZion dorz...@gmail.com wrote:
I
If you're running in cloud mode, move to using collections with
the configs kept in Zookeeper.
Assuming you're not, you can use the create_core stuff, I'm
not sure what's unclear about it, did you try
bin/solr create_core -help? If that's not clear please make some
suggestions for making it more
Hi all
I asked a related question before but couldn't get any response (see
SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
here.
Is there a way to invoke
IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
in invokes the search/collect implicitly
On 7/15/2015 8:55 AM, Martínez López, Alfonso wrote:
in some cases it can be necessary to have the copy field stored. My Solr
instance is used by some legacy applications that need to retrive fields by
some especific field names. That's why i need to mantain 2 copies of the same
field: one
We are building an admin for our inventory. Using solr's faceting,
searching and stats functionality it provides different ways an admin can
look at the inventory.
The admin can also do some updates on the items and they need to see the
updates almost real time.
Our public facing website is
1. I can't get your explanation.
2. childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)
is not correct, lacks of quotes , and pointless (selecting some term, and
negating all terms gives nothing). Thus, considerable syntax can be only
childFilter=other_field:somevalue -image_uri_s:*
3. I
Hi,
in some cases it can be necessary to have the copy field stored. My Solr
instance is used by some legacy applications that need to retrive fields by
some especific field names. That's why i need to mantain 2 copies of the same
field: one with the old name and other for the new name (that is
A common approach to this problem is to include the spellcheck component and,
if there are corrections, include a Did you mean ... link in the results page.
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Wednesday, July 15, 2015 10:36 AM
To:
bq: The admin can also do some updates on the items and they need to see the
updates almost real time.
Why not give the admin control over commits and default the other commits to
something reasonable? So make your defaults, say, 15 seconds (or 30 seconds
or longer). If the admin really needs the
By the way, using OR as the default, other than returning more results as
more words are entered, the ranking and performance of the search remains
the same right?
Steve
On Wed, Jul 15, 2015 at 12:12 PM, Steven White swhite4...@gmail.com wrote:
Thank you all. Looks like OR is a better choice
This is really an apples/oranges comparison. They're essentially different
queries, and scores aren't comparable across different queries.
If you're asking if doc 1 and doc 2 are returned by defaulting to AND or OR,
are they in the same position relative to each other? then I'm pretty sure the
Erick
Thanks for your response and for the pointers! This will be a good starting
point; I will go through these.
The good news is in our usecase, we don't really care about the two passes.
In fact, our results are ConstantScore so we only need to aggregrate (i/e
sum) the results from each
Hi Erick,
I understand there are variables that will impact ranking. However, if I
leave my edismax setting as is and simply switch from AND to OR as the
default Boolean, now if a user types apples oranges (without quotes) will
the ranking be the same as when I had AND? Will the performance be
ok. I checked with with my data
color:orlean = numFound: 1,
-color:[* TO *] = numFound: 602096 (it used to return 0 until 'pure
negational' (sic) queries were delivered)
color:orlean -color:[* TO *] = numFound: 0,
color:orlean (*:* -color:[* TO *]) = numFound: 602097,
fyi
On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora chetanv...@gmail.com wrote:
Hi all
I asked a related question before but couldn't get any response (see
SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
here.
Is there a way to invoke
IndexSearcher.search(Query,
81 matches
Mail list logo