Hello,
I'm using solr and indexing vectors into fields.
In some cases when I search for documents I want to do it based on the
fields containing the vector.
I have an algorithm which defines a score to the document.
The document's score is based on vector multiplication and on an input
vector
There's quite a discussion here: https://issues.apache.org/jira/browse/SOLR-7137
But, I personally am not a huge fan of pushing all the work on to Solr, in a
production environment the Solr server is responsible for indexing, parsing the
docs through Tika, perhaps searching etc. This doesn't
Hi,
I'm trying to gather information on how mlt works or is supposed to work
with SolrCloud and a sharded collection. I've read issues SOLR-6248,
SOLR-5480 and SOLR-4414, and docs at
https://wiki.apache.org/solr/MoreLikeThis, but I'm still struggling
with multiple issues. I've been testing
Hi all,
can I change solrconfig.xml configuration when solrcloud is up and running?
Best regards,
Vincenzo
--
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251
Hi all,
I have a solrcloud cluster with 3 server, is it possible have a new server
solrcloud standalone always in sync with the cluster? I would like to have
something a replica or a slave.
As far as I have seen, it is not possibile do this with solrcloud, so I
have written a batch program that
On 4/15/2015 2:02 AM, Pedro Figueiredo wrote:
My solr installation is in cloud mode... so the basic solr stop and start
does not update the configuration right?
I started solr using:
solr -c -Dbootstrap_confdir=C:\solr-5.0.0\server\solr\patientsCollection\conf
Hi,
I am trying to index various binary file types into Solr. However, some
file types seems to be ignored and not getting indexed, though the metadata
is being extracted successfuly for all the types.
Specifically, zip files and jpg files are not getting indexed, where as
pdf, MS office
Thanks everyone for the responses. Now I am able to index PDF documents
successfully. I have implemented manual extraction using Tika's AutoParser
and PDF functionality is working fine. However, the error with some MS
office word documents still persist.
The error message is
Thanks Andrea. For image files and zip files, even metadata is not
available. Just to explain further, I have indexed a total of 10 files, out
of which a .jpg file and .zip file are present.
After the indexing process is complete, no information about either of
these files is present in the solr
Hi Vijay,
here you can find all supported formats by Tika, which is internally
used by SolrCell:
* https://tika.apache.org/*1.4*/formats.html
* https://tika.apache.org/*1.5*/formats.html
* https://tika.apache.org/*1.6*/formats.html
* https://tika.apache.org/*1.7*/formats.html
Best,
Andrea
We recently upgraded from 4.5.0 to 4.10.4. I tried getting a list of our
synonyms like this:
http://localhost/solr/default-collection/schema/analysis/synonyms/english
I got a not found error. I found this page on new features in 4.8
http://yonik.com/solr-4-8-features/
Do we have to do
Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP) and
image (JPG) formats. If thats the case, why SolrCell could not index the
documents of .zip and .jpg? Am I missing something here? No error is
thrown in the overall process and the java program completes successfully.
But
Sorry, attachments are not supported here :(
Anyway, I believe the misunderstanding resides in what you think you
should mean image indexing: actually, AFAIK, Tika indexes only a) the
textual content of a given resource b) its metadata.
So
- for a JPG file (or in genetal, an image) you will
Hi,
I just install solr and try it.
The index ignore text files with extension like php and py. Is there any way
to add types so solr will index them ?
Thanks.
Solr uses Tika to try to process semi-structured documents. You can
see all the supported document types here:
https://tika.apache.org/1.4/formats.html
I assume you're using the Extracting Request Handler to do this?
Best,
Erick
On Wed, Apr 15, 2015 at 7:31 AM, Shlomit Afgin
Yes, but you must then push the changes up to Zookeeper (usually via
zkcli -cmd upconfig ) then reload the collection to get the
changes to take effect on all the replicas.
Best,
Erick
On Wed, Apr 15, 2015 at 6:12 AM, Vincenzo D'Amore v.dam...@gmail.com wrote:
Hi all,
can I change
When customizing scoring beyond what's available in the Query API, there's
a couple of layers you can work in
1. Create a Solr query parser -- not too hard, just requires very light
Java/Lucene skills. This involves taking a query string and query params
from Solr and digesting them into Lucene
Thanks, it works :)
On Wed, Apr 15, 2015 at 4:38 PM, Erick Erickson erickerick...@gmail.com
wrote:
Yes, but you must then push the changes up to Zookeeper (usually via
zkcli -cmd upconfig ) then reload the collection to get the
changes to take effect on all the replicas.
Best,
Erick
Check to see if there are any errors in the Solr log for jpg and zip files.
Solr should do something for them - if not, file a Jira to suggest that it
should, as an imporvement. Zip should give a list of the enclosed files.
Images should at least give the metadata.
-- Jack Krupansky
On Wed, Apr
Hi
I am trying to port my none solrcloud custom search handler to a solrcloud one.
I have read the WritingDistibutedSearchComponents wiki page and looked at Terms
and Querycomponent codes but the control flow of execution is still fuzzy (even
given the “distributed algorithm” description).
Hi folks,
What is the best practice to manage and update Solr's schema.xml?
I need to deploy Solr dynamically based on customer configuration (they
will pick fields to be indexed or not, they will want to customize the
analyzer (WordDelimiterFilterFactory, etc.) and specify the language to use.
I use Solr to index different kinds of database tables. I have a Solr index
containing a field named category. I make sure that the category field in Solr
gets occupied with the right value depending on the table. This I can use to
build facet queries which works fine.
The problem I have is
the short answer is that you need something to re-open the searcher -- but
i'm not going to go into specifics on how to do that because...
You are dealing with a VERY low level layer of the lucene/solr code stack
-- w/o more details on why you've written this particular bit of code (and
where
I also tried the 4.10.4 default example and set up the synonym list like
this:
{
responseHeader:{
status:0,
QTime:2},
synonymMappings:{
initArgs:{
ignoreCase:true,
format:solr},
initializedOn:2015-04-15T20:26:02.072Z,
managedMap:{
Battery:[Deadweight],
Hi All,
Consider this scenario : I am having around 100K content and I want to
launch 5 sites with that content. For example, around 50K content for site1,
40K content for site2, 30K for site3, 20K for site4, and 10K for site5.
As seen from this example, these sites have few overlapping content
I just tried this quickly on trunk and it still works.
/opt/code/lusolr_trunk$ curl
http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english
{
responseHeader:{
status:0,
QTime:234},
synonymMappings:{
initArgs:{
ignoreCase:true,
format:solr},
Hello all,
my Bitnami/*Solr-5.0.0* instalation is not able to index any type of
file(found in the provided examples folders or anywhere else) except HTML.
Tested on the files in exampledocs folder
(books.csv,books.json,...,utf8-example.xml, vidcard.xml) I get:
for *.csv* files I get the reponse
Dear Chris,
Hi,
Thank you for your response. Actually I implemented a small code for the
purpose of extracting article keywords out of Lucene index on commit,
optimize or calling the specific query. I did implement that using search
component. I know that the searchComponent is not for the purpose
Hi All,
In the interests of minimizing round-trips to the database, is there any way to
get the added/changed _version_ values returned from /update? Or do you
always have to do a fresh get?
Yes, I am using optimistic concurrency. No, I am not using atomic updates
(yet).
Has anyone tried
Dear Jack,
Hi,
The q parameter is *:* since I just wanted to filter the documents.
Regards.
On Tue, Apr 14, 2015 at 8:07 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:
What does your main query look like? Normally we don't speak of searching
with the fq parameter - it filters the results,
Thanks. It turned out to be caused by me not using the
ManagedSynonymFilterFactory.
I added the dummy managed_en field:
fieldType name=managed_en class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter
My standard answer when you want to really customize how stuff like
this works is to do the Tika processing in SolrJ. That lets you
ignore/modify/whatever anything you want. It also moves the parsing
load off of the Solr node which scales much better. Here's an example:
On 4/15/2015 3:54 PM, Steven White wrote:
Hi folks,
If a user types in the search box (without quotes): {!q.op=AND df=text
solr sys and I take that text and build the URL like so:
http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sysfl=id%2Cscore%2Ctitlewt=xmlindent=true
Thanks, this is exactly what I was looking for!!
Steve
On Wed, Apr 15, 2015 at 5:48 PM, Erick Erickson erickerick...@gmail.com
wrote:
Have you looked at the managed schema stuff?
see:
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
There's also some
Have you looked at the managed schema stuff?
see:
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
There's also some work being done to update at least parts of
solrconfig.xml, see:
https://issues.apache.org/jira/browse/SOLR-6533
Best,
Erick
On Wed, Apr
Using edismax, supplying a rq= param, like {!rerank ...} is causing an
UnsupportedOperationException because the Query doesn't implement
createWeight. This is for WildcardQuery in particular. From some
preliminary debugging it looks like without rq, somehow the qf Queries
might turn into
Hi folks,
If a user types in the search box (without quotes): {!q.op=AND df=text
solr sys and I take that text and build the URL like so:
http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sysfl=id%2Cscore%2Ctitlewt=xmlindent=true
This will fail with Expected identifier
Hey, that's great! I'll give it a try.
File under, never hurts to ask ... :-)
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Wednesday, April 15, 2015 5:15 PM
To: solr-user@lucene.apache.org
Subject: Re: _version_ returned from /update?
: In the
You're going to have to provide a lot more details (solr version, sample
data, full queries, details about configs, etc...) in order for anyone to
offer you meaningful assistence...
https://wiki.apache.org/solr/UsingMailingLists
I attempted to reproduce the steps you describe using Solr 5.1
Hi Eric,
Thanks for your response. I was planning to do the same, to store the data
in a single collection with site parameter differentiating duplicated
content for different sites. But my use case is that in future the content
would run into millions and potentially there could be large number
At this data size, don't worry at _all_ about duplicating content. A
single Solr node easily holds 20M docs. 50M is common and 250M is not
unheard of.
My bold claim is: you can freely duplicate the data to your heart's
content and you'll never notice it.
In fact, you can put it all in a single
: In the interests of minimizing round-trips to the database, is there any
: way to get the added/changed _version_ values returned from /update?
: Or do you always have to do a fresh get?
there is a versions=true param you can specify on updates to get the
version# back for each doc added
Trying again since I don't have an answer yet.
Thanks!
Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
eyal.naam...@exlibrisgroup.commailto:eyal.naam...@exlibrisgroup.com
[Description: Description: Description: Description: C://signature/exlibris.jpg]
Hello,
http://localhost:8983/solr/patientsCollection/select?q=*%3A*sort=name_sort+ascwt=jsonindent=true_=1429082874881
I am using the solr console admin and in the query option I just define the
field sort with name_sort asc.
Pedro Figueiredo
Senior Engineer
pjlfigueir...@criticalsoftware.com
Hello,
Yes I restart solr and re-index after the change.
The request is:
http://localhost:8983/solr/patientsCollection/select?q=*%3A*sort=name_sort+ascwt=jsonindent=true_=1429082874881
I am using the solr console admin and in the query option I just define the
field sort with name_sort asc.
That said, it might be nice with a wiki-page (or something) explaining
how it can be done, including maybe concrete cases about exactly how it
has been done on different installations around the world using Solr
On 14/04/15 14:03, Per Steffensen wrote:
Hi
I might misunderstand you, but if
Folks, just a quick heads-up that apparently Solr 5.1 introduced a
change in bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh
or environment. I just filed
https://issues.apache.org/jira/browse/SOLR-7392. The problem can be
circumvented by using SOLR_HEAP setting, e.g.
Really strange to me: the cause should be what Shawn already pointed
out, because that error is raised when:
SchemaField sf = req.getSchema().getFieldOrNull(field);
is null:
if (null == sf) {
...
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, sort
param field can't
Hi,
I am looking for possibility to validate document that is about to be
inserted against schema to check if addition of document will fail or
not w/o actually making an insert. Is there a way to that? I'm doing
update from inside the Solr plugin so there is an access to API if that
Ok... my bad...
My solr installation is in cloud mode... so the basic solr stop and start does
not update the configuration right?
I started solr using:
solr -c -Dbootstrap_confdir=C:\solr-5.0.0\server\solr\patientsCollection\conf
-Dcollection.configName=myconf
and the error was solved.
Hi,
I want to implement a custom TFIDF similarity scoring function. I read the
code for org.apache.lucene.search.similarities.DefaultSimilarity. I could
not find a way to get the query that user provided.
In my case, I would want to allow the user to upload some binary content to
my search
51 matches
Mail list logo