On 01/25/2014 07:21 PM, christopher palm wrote:
The problem I am trying to solve is that the order of these updates isn’t
guaranteed once the multi threaded SOLRJ client starts sending them to
SOLR, and older updates are overlaying the newer updates on the same
document.
Don't do that. There
You have to create your own parser which extends the current query parser.
You have to override the newFuzzyQuery protected function to call the
FuzzyQuery constructor with a configured maximum expansion value or
something like that.
Ludovic.
-
Jouve
France.
--
View this message in
Query Recommendations using Query Logs in Search Engines
http://personales.dcc.uchile.cl/~churtado/clustwebLNCS.pdf
Very interesting paper and section 2.1 covers related work plus references.
In our first attempt we did it even simpler, by finding for each query other
top queries by inspecting
On Mon, Jan 27, 2014 at 9:30 AM, William Bell billnb...@gmail.com wrote:
OK,
In order to do boosting, we often will create a dynamic field in SOLR. For
example:
A Professional hire out for work, I want to boost those who do
woodworking.
George Smith - builds chairs, and builds desks. He
Hello,
I am writing a mini-client app and realized that I can't find a
canonical example of how to write a SolrJ-based client. Given that it
is the recommended API, I thought we had one in the distribution. But
I can't seem to find it.
I also checked
-
On 1/28/2014 8:01 AM, Alexandre Rafalovitch wrote:
What am I missing? Should I be looking at test classes:
https://github.com/apache/lucene-solr/tree/trunk/solr/solrj/src/test/org/apache/solr/client/solrj
? They seem to be a little specialized to be reversing engineering
basic examples from.
On Tue, Jan 28, 2014 at 10:27 PM, Shawn Heisey s...@elyograg.org wrote:
What would you like to see in an example? I know a fair amount about
SolrJ, and we have access to lots of smart people here, so how about we
discuss what you think is missing? We can come up with something that
will
Hi,
How to get most relevent items on top of search results using solr search?
--
Rashmi
Be the change that you want to see in this world!
That's a catch-22 question and the most difficult one. How do you know
they are relevant? Do you have a current base for comparison?
Usually, you do the most basic thing. Then, you watch for what users
are searching for and what they get. You figure out why something is
not right, look at your
Hi,
Question1 -- When Solr could parse html, documents like doc, excel pdf
etc, why do we need nutch to parse html files? what is different?
Questions 2: When do we use multiple core in solar? any practical business
case when we need multiple cores?
Question 3: When do we go for cloud? What is
Hi,
Questions 1) Why do we use Spellings file under solr core conf folder?
What spellings do we enter in this?
Question 2) : Implementing all synonyms is a tough thing. From where could
i get list of as many synonyms as we could see in google search?
--
Rashmi
Be the change that you want
There are always tests that are always current in the junit tests. Admittedly,
they aren't nice, stand-alone tests, but they do the things that you are
referencing. And they're always up to date since the tests are always
running.
Look particularly at SolrExampleTests.java and SolrQueryTest.java.
Thanks saurish.
My office *intranet *is a sharepoint website. When I am crawling it using
nutch, i am getting Unauthorized access(404) error. NTLM realm is used in
this website.
I checked on one nutch JIRA link that sharepoint could be accessed using
nutch. Nutch has below properties in
Erick,
I think we are looking at this from different points of view. You seem
to be saying that the tests work therefore somebody could reverse
engineer the correct usage from it.
I am saying that - for a beginner - it would be useful to have a
minimum working example that shows a critical path
1. Nutch follows the links within HTML web pages to crawl the full graph of
a web of pages.
2. Think of a core as an SQL table - each table/core has a different type of
data.
3. SolrCloud is all about scaling and availability - multiple shards for
larger collections and multiple replicas
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to
achieve large-scale crawling using multiple nodes, it fetch the content of the
HTML file, and yes it also parse its content.
Q2: In our case we use sold to crawl some website, store the content in one
“main” solr
1) Plus, those files are binaries sometimes with metadata, specific
crawlers need to understand them. html is a plain text
2) Yes, different data schemes. Sometimes I replicate the same core and
make some A-B tests with different weights, filters etc etc and some people
like to creare CoreA and
You may also want to look here
http://wiki.apache.org/solr/SolrRelevancyFAQ
--
View this message in context:
http://lucene.472066.n3.nabble.com/implement-relevency-tp4113964p4113983.html
Sent from the Solr - User mailing list archive at Nabble.com.
2) There are some synonym lists on the web, they aren't always complete but
I keep analyzing fields and tokens in order to polish my synonyms. And I
like to use tools like http://www.visualthesaurus.com/ to aid me.
Hope this helps :-)
alexei martchenko
Facebook
** Using solrcloud 4.4.0 **
I had to kill a running solrcloud node. There is still a replica for that
shard, so everything is functional. We've done some indexing while the
node was killed.
I'd like to bring back up the downed node and have it resync from the other
replica. But when I restart
Thanks All for quick response.
Today I crawled a webpage using nutch. This page have many links. But all
anchor tags have href=# and javascript is written on onClick event of
each anchor tag to open a new page.
So crawler didnt crawl any of those links which were opening using onClick
event and
Thanks for quick response Alexei.
I will check this link to prepare synonym list.
On Tue, Jan 28, 2014 at 11:00 PM, Alexei Martchenko
ale...@martchenko.com.br wrote:
2) There are some synonym lists on the web, they aren't always complete but
I keep analyzing fields and tokens in order to
On 1/28/2014 10:31 AM, Greg Preston wrote:
** Using solrcloud 4.4.0 **
I had to kill a running solrcloud node. There is still a replica for that
shard, so everything is functional. We've done some indexing while the
node was killed.
I'd like to bring back up the downed node and have it
Thanks for the idea. I tried it, and the state for the bad node, even
after an orderly shutdown, is still active in clusterstate.json. I see
this in the logs on restart:
[28 Jan 2014 18:25:29] [RecoveryThread] ERROR
(org.apache.solr.common.SolrException) - Error while trying to recover.
Short answer, you can't.rashmi maheshwari maheshwari.ras...@gmail.com
schreef:Thanks All for quick response.
Today I crawled a webpage using nutch. This page have many links. But all
anchor tags have href=# and javascript is written on onClick event of
each anchor tag to open a new page.
So
Well, not even Google parse those. I'm not sure about Nutch but in some
crawlers (jSoup i believe) there's an option to try to get full URLs from
plain text, so you can capture some urls in the form of someClickFunction('
http://www.someurl.com/whatever') or even if they are in the middle of some
Hi Folks,
I would like to know what is the best way to integrate PHP and Apache Solr.
Until now I've found two options:
1) http://www.php.net/manual/en/intro.solr.php
2) http://www.solarium-project.org/
What do you guys say?
Cheers,
Felipe
AVISO: A
Thanks Markus and Alexei.
On Wed, Jan 29, 2014 at 12:08 AM, Alexei Martchenko
ale...@martchenko.com.br wrote:
Well, not even Google parse those. I'm not sure about Nutch but in some
crawlers (jSoup i believe) there's an option to try to get full URLs from
plain text, so you can capture some
Sounds like a bug. 4.6.1 is out any minute - you might try that. There was a
replication bug that may be involved.
If you removed the tlog and index and restart it should resync, or something is
really crazy.
The clusterstate.json is a red herring. You have to merge the live nodes info
with
I have the same problem, please look at the image:
http://lucene.472066.n3.nabble.com/file/n4114026/Screenshot_733.png
And this is on idle. Index size is about 90Gb. Solr 4.4.0. Memory is not an
issue, there's a lot. RAID 10 (15000RPM rapid hdd).
--
View this message in context:
January 2014, Apache Solr™ 4.6.1 available The Lucene PMC is pleased
to announce the release of Apache Solr 4.6.1Solr is the popular,
blazing fast, open source NoSQL search platform from the Apache Lucene
project. Its major features include powerful full-text search, hit
highlighting, faceted
I’ve some experience using Solarium and have been great so far. In particular
we use the NelmioSolariumBundle to integrate with Symfony2.
Greetings!
On Jan 28, 2014, at 1:54 PM, Felipe Dantas de Souza Paiva
cad_fpa...@uolinc.com wrote:
Hi Folks,
I would like to know what is the best way
: The initial results seem to be kinda promising... of course there are many
: more optimizations I could do like decay user ratings over time to indicate
: that preferences decay over time so a 5 rating a year ago doesn't count as
: much as a 5 rating today.
:
: Hope this helps others. I'll
I have no experience with either as we have our own PHP layer to interface
with Solr. If I started again from scratch today I would surely consider
Solarium.
The PHP extension seems outdated and no longer maintained. It says it works
with Solr 3.1 and last update is from 2011.
André
2014-01-28
Hi,
Show us more graphs. Is the GC working hard? Any of the JVM mem pools at
or near 100%? SPM for Solr is your friend for long term
monitoring/alerting/trends, jconsole and visualvm for a quick look.
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch
Hi Rashmi,
Relevancy needs some kind of training data which can lead to a chicken and
egg problem. If you dont have that training set, then you need to come up
with it or train manually (provide some seed).
Our existing search had 2 years worth clickstream data, i.e. we know if
someone searches
Hi Felipe,
I would advise to go with Solarium here.
The PHP PECL integration is outdated to a degree that renders it almost
useless by now.
It has a couple of bugs when fetching large result sets.
Good luck,
Alex
-Ursprüngliche Nachricht-
Von: Felipe Dantas de Souza Paiva
It was my hope that storing solr.xml would mean I could spin up a Solr node
pointing it to a properly configured zookeeper ensamble, and that no further
local configuration or knowledge would be necessary.
However, I’m beginning to wonder if that’s sufficient. It’s looking like I may
also
Hi just configured HDFS + Solr when am starting solr am getting error
message like this
ERROR org.apache.solr.core.CoreContainer – Unable to create core:
collection1
org.apache.solr.common.SolrException: Problem creating directory:
Hi just configured HDFS + Solr when am starting solr am getting error
message like this
ERROR org.apache.solr.core.CoreContainer – Unable to create core:
collection1
org.apache.solr.common.SolrException: Problem creating directory:
Hi guys
With the new version of solr (4.6), can I add a field to the index, knowing
that this field doesnt appear(isnt predefined) in the schema?
I ask this question because I ve seen an issue (on jira) related to this.
Thanks!
Hi All,
I have a scenario where for example my document titles are
1. battle of the zords
2. joes battle
3. who was in that crazy battle.
and if the search term is [battle]
I want to penalize the second and third document score because battle was
matched further down than in the first
On 01/28/2014 11:55 AM, Alexandre Rafalovitch wrote:
As to ESS, like I mentioned, the classpath issue seem to be quite a
challenge. Again, perhaps not something that shows up during the
testing because the directory layout during testing is rather
different from the end-user's layout.
I'm not
Hi Hakim,
Check out the section of the Solr Reference Guide on modifying the schema via
REST API:
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema
Steve
On Jan 28, 2014, at 5:00 PM, Hakim Benoudjit h.benoud...@gmail.com wrote:
Hi guys
With the new
Thanks Mike,
Sounds like Maven approach worked, I haven't tried that. But without
Maven, it is much harder. Like I said (in the second email of the
thread) the relevant jars are all over the distribution including some
inside the .war file. And the only way to figure it out is to run the
client
There is an example in the distribution that shows how new fields are
auto-defined. I think it is example-schemaless. The secret is in the
UpdateRequestProcessor chain that does cleanup and auto-mapping. Plus
- I guess - automatically generated schema.
Just remember that once the field is added
1. Nutch follows the links within HTML web pages to crawl the full graph of a
web of pages.
In addition, I think Nutch has PageRank-like scoring function as opposed to
Lucene/Solr, those are based on vector space model scoring.
koji
--
Lucene and Solr work best if the full index can be cached in OS memory.
Sure, Lucene/Solr does work properly once the index no longer fits, but
performance will drop off.
I would say that you could fit 100 million moderate-size documents on a
single Solr server - provided that you give the OS
Maybe i'm mising something, but everything you are describing sounds
correct and working properly -- the disconnect between what i think is
suppose to happen and what you seem to be expecting seems to be right
arround here
: essentially used empty directories for the solr home. Although
Thanks, Jack. That helps.
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Tuesday, January 28, 2014 8:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents
Lucene and Solr work best if the full index can be
Previously in the list a spreadsheet has been mentioned, taking into account
that you already have documents in an index you could extract the needed
information from your index and feed it into the spreadsheet and it probably
will give you a rough approximated of the hardware you’ll bee
51 matches
Mail list logo