Change scoring method

2015-04-15 Thread מאיה גלעד
Hello, I'm using solr and indexing vectors into fields. In some cases when I search for documents I want to do it based on the fields containing the vector. I have an algorithm which defines a score to the document. The document's score is based on vector multiplication and on an input vector

Re: Indexing PDF and MS Office files

2015-04-15 Thread Erick Erickson
There's quite a discussion here: https://issues.apache.org/jira/browse/SOLR-7137 But, I personally am not a huge fan of pushing all the work on to Solr, in a production environment the Solr server is responsible for indexing, parsing the docs through Tika, perhaps searching etc. This doesn't

MoreLikeThis (mlt) in sharded SolrCloud

2015-04-15 Thread Ere Maijala
Hi, I'm trying to gather information on how mlt works or is supposed to work with SolrCloud and a sharded collection. I've read issues SOLR-6248, SOLR-5480 and SOLR-4414, and docs at https://wiki.apache.org/solr/MoreLikeThis, but I'm still struggling with multiple issues. I've been testing

SolrCloud 4.8 - solrconfig.xml hot changes

2015-04-15 Thread Vincenzo D'Amore
Hi all, can I change solrconfig.xml configuration when solrcloud is up and running? Best regards, Vincenzo -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251

SolrCluoud servers sync/replica

2015-04-15 Thread Vincenzo D'Amore
Hi all, I have a solrcloud cluster with 3 server, is it possible have a new server solrcloud standalone always in sync with the cluster? I would like to have something a replica or a slave. As far as I have seen, it is not possibile do this with solrcloud, so I have written a batch program that

Re: sort by a copy field error

2015-04-15 Thread Shawn Heisey
On 4/15/2015 2:02 AM, Pedro Figueiredo wrote: My solr installation is in cloud mode... so the basic solr stop and start does not update the configuration right? I started solr using: solr -c -Dbootstrap_confdir=C:\solr-5.0.0\server\solr\patientsCollection\conf

ContentTypes supported by Solr to index

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, I am trying to index various binary file types into Solr. However, some file types seems to be ignored and not getting indexed, though the metadata is being extracted successfuly for all the types. Specifically, zip files and jpg files are not getting indexed, where as pdf, MS office

Re: Indexing PDF and MS Office files

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks everyone for the responses. Now I am able to index PDF documents successfully. I have implemented manual extraction using Tika's AutoParser and PDF functionality is working fine. However, the error with some MS office word documents still persist. The error message is

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Andrea. For image files and zip files, even metadata is not available. Just to explain further, I have indexed a total of 10 files, out of which a .jpg file and .zip file are present. After the indexing process is complete, no information about either of these files is present in the solr

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Andrea Gazzarini
Hi Vijay, here you can find all supported formats by Tika, which is internally used by SolrCell: * https://tika.apache.org/*1.4*/formats.html * https://tika.apache.org/*1.5*/formats.html * https://tika.apache.org/*1.6*/formats.html * https://tika.apache.org/*1.7*/formats.html Best, Andrea

Using synonyms API

2015-04-15 Thread Mike Thomsen
We recently upgraded from 4.5.0 to 4.10.4. I tried getting a list of our synonyms like this: http://localhost/solr/default-collection/schema/analysis/synonyms/english I got a not found error. I found this page on new features in 4.8 http://yonik.com/solr-4-8-features/ Do we have to do

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP) and image (JPG) formats. If thats the case, why SolrCell could not index the documents of .zip and .jpg? Am I missing something here? No error is thrown in the overall process and the java program completes successfully. But

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Andrea Gazzarini
Sorry, attachments are not supported here :( Anyway, I believe the misunderstanding resides in what you think you should mean image indexing: actually, AFAIK, Tika indexes only a) the textual content of a given resource b) its metadata. So - for a JPG file (or in genetal, an image) you will

file index format

2015-04-15 Thread Shlomit Afgin
Hi, I just install solr and try it. The index ignore text files with extension like php and py. Is there any way to add types so solr will index them ? Thanks.

Re: file index format

2015-04-15 Thread Erick Erickson
Solr uses Tika to try to process semi-structured documents. You can see all the supported document types here: https://tika.apache.org/1.4/formats.html I assume you're using the Extracting Request Handler to do this? Best, Erick On Wed, Apr 15, 2015 at 7:31 AM, Shlomit Afgin

Re: SolrCloud 4.8 - solrconfig.xml hot changes

2015-04-15 Thread Erick Erickson
Yes, but you must then push the changes up to Zookeeper (usually via zkcli -cmd upconfig ) then reload the collection to get the changes to take effect on all the replicas. Best, Erick On Wed, Apr 15, 2015 at 6:12 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi all, can I change

Re: Change scoring method

2015-04-15 Thread Doug Turnbull
When customizing scoring beyond what's available in the Query API, there's a couple of layers you can work in 1. Create a Solr query parser -- not too hard, just requires very light Java/Lucene skills. This involves taking a query string and query params from Solr and digesting them into Lucene

Re: SolrCloud 4.8 - solrconfig.xml hot changes

2015-04-15 Thread Vincenzo D'Amore
Thanks, it works :) On Wed, Apr 15, 2015 at 4:38 PM, Erick Erickson erickerick...@gmail.com wrote: Yes, but you must then push the changes up to Zookeeper (usually via zkcli -cmd upconfig ) then reload the collection to get the changes to take effect on all the replicas. Best, Erick

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Jack Krupansky
Check to see if there are any errors in the Solr log for jpg and zip files. Solr should do something for them - if not, file a Jira to suggest that it should, as an imporvement. Zip should give a list of the enclosed files. Images should at least give the metadata. -- Jack Krupansky On Wed, Apr

custom search component on solrcloud

2015-04-15 Thread Peyman Faratin
Hi I am trying to port my none solrcloud custom search handler to a solrcloud one. I have read the WritingDistibutedSearchComponents wiki page and looked at Terms and Querycomponent codes but the control flow of execution is still fuzzy (even given the “distributed algorithm” description).

How do you manage / update schema.xml file

2015-04-15 Thread Steven White
Hi folks, What is the best practice to manage and update Solr's schema.xml? I need to deploy Solr dynamically based on customer configuration (they will pick fields to be indexed or not, they will want to customize the analyzer (WordDelimiterFilterFactory, etc.) and specify the language to use.

How do I tell Tika to not complement a field's value defined in my Solr schema when indexing a binary document?

2015-04-15 Thread Patrick Savelberg
I use Solr to index different kinds of database tables. I have a Solr index containing a field named category. I make sure that the category field in Solr gets occupied with the right value depending on the table. This I can use to build facet queries which works fine. The problem I have is

Re: Lucene updateDocument does not affect index until restarting solr

2015-04-15 Thread Chris Hostetter
the short answer is that you need something to re-open the searcher -- but i'm not going to go into specifics on how to do that because... You are dealing with a VERY low level layer of the lucene/solr code stack -- w/o more details on why you've written this particular bit of code (and where

Re: Using synonyms API

2015-04-15 Thread Mike Thomsen
I also tried the 4.10.4 default example and set up the synonym list like this: { responseHeader:{ status:0, QTime:2}, synonymMappings:{ initArgs:{ ignoreCase:true, format:solr}, initializedOn:2015-04-15T20:26:02.072Z, managedMap:{ Battery:[Deadweight],

solr index design for this use case?

2015-04-15 Thread vsriram30
Hi All, Consider this scenario : I am having around 100K content and I want to launch 5 sites with that content. For example, around 50K content for site1, 40K content for site2, 30K for site3, 20K for site4, and 10K for site5. As seen from this example, these sites have few overlapping content

Re: Using synonyms API

2015-04-15 Thread Yonik Seeley
I just tried this quickly on trunk and it still works. /opt/code/lusolr_trunk$ curl http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english { responseHeader:{ status:0, QTime:234}, synonymMappings:{ initArgs:{ ignoreCase:true, format:solr},

(possible)SimplePostTool problem --(Windows, Bitnami distribution)

2015-04-15 Thread kenadian
Hello all, my Bitnami/*Solr-5.0.0* instalation is not able to index any type of file(found in the provided examples folders or anywhere else) except HTML. Tested on the files in exampledocs folder (books.csv,books.json,...,utf8-example.xml, vidcard.xml) I get: for *.csv* files I get the reponse

Re: Lucene updateDocument does not affect index until restarting solr

2015-04-15 Thread Ali Nazemian
Dear Chris, Hi, Thank you for your response. Actually I implemented a small code for the purpose of extracting article keywords out of Lucene index on commit, optimize or calling the specific query. I did implement that using search component. I know that the searchComponent is not for the purpose

_version_ returned from /update?

2015-04-15 Thread Reitzel, Charles
Hi All, In the interests of minimizing round-trips to the database, is there any way to get the added/changed _version_ values returned from /update? Or do you always have to do a fresh get? Yes, I am using optimistic concurrency. No, I am not using atomic updates (yet). Has anyone tried

Re: Problem related to filter on Zero value for DateField

2015-04-15 Thread Ali Nazemian
Dear Jack, Hi, The q parameter is *:* since I just wanted to filter the documents. Regards. On Tue, Apr 14, 2015 at 8:07 PM, Jack Krupansky jack.krupan...@gmail.com wrote: What does your main query look like? Normally we don't speak of searching with the fq parameter - it filters the results,

Re: Using synonyms API

2015-04-15 Thread Mike Thomsen
Thanks. It turned out to be caused by me not using the ManagedSynonymFilterFactory. I added the dummy managed_en field: fieldType name=managed_en class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter

Re: How do I tell Tika to not complement a field's value defined in my Solr schema when indexing a binary document?

2015-04-15 Thread Erick Erickson
My standard answer when you want to really customize how stuff like this works is to do the Tika processing in SolrJ. That lets you ignore/modify/whatever anything you want. It also moves the parsing load off of the Solr node which scales much better. Here's an example:

Re: Differentiating user search term in Solr

2015-04-15 Thread Shawn Heisey
On 4/15/2015 3:54 PM, Steven White wrote: Hi folks, If a user types in the search box (without quotes): {!q.op=AND df=text solr sys and I take that text and build the URL like so: http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sysfl=id%2Cscore%2Ctitlewt=xmlindent=true

Re: How do you manage / update schema.xml file

2015-04-15 Thread Steven White
Thanks, this is exactly what I was looking for!! Steve On Wed, Apr 15, 2015 at 5:48 PM, Erick Erickson erickerick...@gmail.com wrote: Have you looked at the managed schema stuff? see: https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig There's also some

Re: How do you manage / update schema.xml file

2015-04-15 Thread Erick Erickson
Have you looked at the managed schema stuff? see: https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig There's also some work being done to update at least parts of solrconfig.xml, see: https://issues.apache.org/jira/browse/SOLR-6533 Best, Erick On Wed, Apr

rq breaks wildcard search?

2015-04-15 Thread Ryan Josal
Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into

Differentiating user search term in Solr

2015-04-15 Thread Steven White
Hi folks, If a user types in the search box (without quotes): {!q.op=AND df=text solr sys and I take that text and build the URL like so: http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sysfl=id%2Cscore%2Ctitlewt=xmlindent=true This will fail with Expected identifier

RE: _version_ returned from /update?

2015-04-15 Thread Reitzel, Charles
Hey, that's great! I'll give it a try. File under, never hurts to ask ... :-) -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, April 15, 2015 5:15 PM To: solr-user@lucene.apache.org Subject: Re: _version_ returned from /update? : In the

Re: Problem related to filter on Zero value for DateField

2015-04-15 Thread Chris Hostetter
You're going to have to provide a lot more details (solr version, sample data, full queries, details about configs, etc...) in order for anyone to offer you meaningful assistence... https://wiki.apache.org/solr/UsingMailingLists I attempted to reproduce the steps you describe using Solr 5.1

Re: solr index design for this use case?

2015-04-15 Thread vsriram30
Hi Eric, Thanks for your response. I was planning to do the same, to store the data in a single collection with site parameter differentiating duplicated content for different sites. But my use case is that in future the content would run into millions and potentially there could be large number

Re: solr index design for this use case?

2015-04-15 Thread Erick Erickson
At this data size, don't worry at _all_ about duplicating content. A single Solr node easily holds 20M docs. 50M is common and 250M is not unheard of. My bold claim is: you can freely duplicate the data to your heart's content and you'll never notice it. In fact, you can put it all in a single

Re: _version_ returned from /update?

2015-04-15 Thread Chris Hostetter
: In the interests of minimizing round-trips to the database, is there any : way to get the added/changed _version_ values returned from /update? : Or do you always have to do a fresh get? there is a versions=true param you can specify on updates to get the version# back for each doc added

RE: Korean script conversion

2015-04-15 Thread Eyal Naamati
Trying again since I don't have an answer yet. Thanks! Eyal Naamati Alma Developer Tel: +972-2-6499313 Mobile: +972-547915255 eyal.naam...@exlibrisgroup.commailto:eyal.naam...@exlibrisgroup.com [Description: Description: Description: Description: C://signature/exlibris.jpg]

RE: sort by a copy field error

2015-04-15 Thread Pedro Figueiredo
Hello, http://localhost:8983/solr/patientsCollection/select?q=*%3A*sort=name_sort+ascwt=jsonindent=true_=1429082874881 I am using the solr console admin and in the query option I just define the field sort with name_sort asc. Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com

RE: sort by a copy field error

2015-04-15 Thread Pedro Figueiredo
Hello, Yes I restart solr and re-index after the change. The request is: http://localhost:8983/solr/patientsCollection/select?q=*%3A*sort=name_sort+ascwt=jsonindent=true_=1429082874881 I am using the solr console admin and in the query option I just define the field sort with name_sort asc.

Re: Securing solr index

2015-04-15 Thread Per Steffensen
That said, it might be nice with a wiki-page (or something) explaining how it can be done, including maybe concrete cases about exactly how it has been done on different installations around the world using Solr On 14/04/15 14:03, Per Steffensen wrote: Hi I might misunderstand you, but if

Solr 5.1 ignores SOLR_JAVA_MEM setting

2015-04-15 Thread Ere Maijala
Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can be circumvented by using SOLR_HEAP setting, e.g.

Re: sort by a copy field error

2015-04-15 Thread Andrea Gazzarini
Really strange to me: the cause should be what Shawn already pointed out, because that error is raised when: SchemaField sf = req.getSchema().getFieldOrNull(field); is null: if (null == sf) { ... throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, sort param field can't

Validate document against schema

2015-04-15 Thread Artem Karpenko
Hi, I am looking for possibility to validate document that is about to be inserted against schema to check if addition of document will fail or not w/o actually making an insert. Is there a way to that? I'm doing update from inside the Solr plugin so there is an access to API if that

RE: sort by a copy field error

2015-04-15 Thread Pedro Figueiredo
Ok... my bad... My solr installation is in cloud mode... so the basic solr stop and start does not update the configuration right? I started solr using: solr -c -Dbootstrap_confdir=C:\solr-5.0.0\server\solr\patientsCollection\conf -Dcollection.configName=myconf and the error was solved.

How to get the query content in DefaultSimilarity class?

2015-04-15 Thread Xi Shen
Hi, I want to implement a custom TFIDF similarity scoring function. I read the code for org.apache.lucene.search.similarities.DefaultSimilarity. I could not find a way to get the query that user provided. In my case, I would want to allow the user to upload some binary content to my search