Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Walter Underwood
emental values per mime-type. > The algorithms are pluggable and overridable at any point of interest. You > can go all the way. > > -Original message- >> From:Walter Underwood <wun...@wunderwood.org> >> Sent: Wednesday 3rd August 2016 20:03 >> To: solr-u

RE: [Non-DoD Source] Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Markus Jelsma
gust 2016 20:08 > To: solr-user@lucene.apache.org > Subject: RE: [Non-DoD Source] Re: SOLR + Nutch set up (UNCLASSIFIED) > > CLASSIFICATION: UNCLASSIFIED > > Shall I assume that, even though nutch has adaptive capability, I would still > have to figure out how to trigger it to g

RE: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Markus Jelsma
lt;wun...@wunderwood.org> > Sent: Wednesday 3rd August 2016 20:03 > To: solr-user@lucene.apache.org > Subject: Re: SOLR + Nutch set up (UNCLASSIFIED) > > That’s good news. > > It should reset the interval estimate on page change instead of slowly > shortening it. > &

RE: [Non-DoD Source] Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
3 PM To: solr-user@lucene.apache.org Subject: [Non-DoD Source] Re: SOLR + Nutch set up (UNCLASSIFIED) All active links contained in this email were disabled. Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pas

Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Walter Underwood
rincipal Engineer >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Aug 3, 2016, at 10:12 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) >> <kris.t.musshorn@mail.mil> wrote: >>> >>> CLASSIFICAT

Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Marco Scalone
ncipal Engineer > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 3, 2016, at 10:12 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) > <kris.t.musshorn@mail.mil> wrote: > > > > CLASSIFICATION: UNCLASSIFIED > > > >

Re: SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Walter Underwood
016, at 10:12 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) > <kris.t.musshorn@mail.mil> wrote: > > CLASSIFICATION: UNCLASSIFIED > > We are currently using ultraseek and looking to deprecate it in favor of > solr/nutch. > Ultraseek runs all the time and auto d

SOLR + Nutch set up (UNCLASSIFIED)

2016-08-03 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED We are currently using ultraseek and looking to deprecate it in favor of solr/nutch. Ultraseek runs all the time and auto detects when pages have changed and automatically reindexes them. Is this possible with SOLR/nutch? Thanks, Kris

Solr Nutch

2014-01-28 Thread rashmi maheshwari
Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions 2: When do we use multiple core in solar? any practical business case when we need multiple cores? Question 3: When do we go for cloud? What is

Re: Solr Nutch

2014-01-28 Thread Jack Krupansky
for both scaling of query response and availability if nodes go down. -- Jack Krupansky -Original Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like

Re: Solr Nutch

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to achieve large-scale crawling using multiple nodes, it fetch the content of the HTML file, and yes it also parse its content. Q2: In our case we use sold to crawl some website, store the content in one “main” solr

Re: Solr Nutch

2014-01-28 Thread Alexei Martchenko
down. -- Jack Krupansky -Original Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html

Re: Solr Nutch

2014-01-28 Thread rashmi maheshwari
To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions 2: When do we use multiple core in solar? any practical business case when

Re: Solr Nutch

2014-01-28 Thread Markus Jelsma
Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions

Re: Solr Nutch

2014-01-28 Thread Alexei Martchenko
collections and multiple replicas for both scaling of query response and availability if nodes go down. -- Jack Krupansky -Original Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi

Re: Solr Nutch

2014-01-28 Thread rashmi maheshwari
Krupansky -Original Message- From: rashmi maheshwari Sent: Tuesday, January 28, 2014 11:36 AM To: solr-user@lucene.apache.org Subject: Solr Nutch Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch

Re: Solr Nutch

2014-01-28 Thread Koji Sekiguchi
1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. In addition, I think Nutch has PageRank-like scoring function as opposed to Lucene/Solr, those are based on vector space model scoring. koji --

AjaxSolr + Solr + Nutch question

2012-07-14 Thread praful
?? Thanks Regards Praful Bagai -- View this message in context: http://lucene.472066.n3.nabble.com/AjaxSolr-Solr-Nutch-question-tp3995030.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spellcheck in solr-nutch integration

2011-02-05 Thread 666
Hello Anurag, I'm facing the same problem. Will u please elaborate on how u solved the problem? It would be great if u give me a step by step description as I'm new in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration

Re: Spellcheck in solr-nutch integration

2011-02-05 Thread Anurag
on how u solved the problem? It would be great if u give me a step by step description as I'm new in Solr. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration

Re: Spellcheck in solr-nutch integration

2010-11-29 Thread Anurag
i solved the problemAll we need to modify schema file. Also the spellcheck index is created first when spellcheck.build=true - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p1988252.html Sent from

Spellcheck in solr-nutch integration

2010-11-23 Thread Anurag
in this Solr-nutch integration. I have got a separate Solr-1.4 where there are options available for Spellcheck. What i want to ask is... 1.Indexing for spellcheck is to be done as the same time of indexing the contents.?What are the steps to follow? 2.How can i implement spellcheck in solr-nutch

Seeking Solr/Nutch consultant in San Jose, CA

2009-09-30 Thread Leann Pereira
Hi, I am working with a SaaS vendor who is integrated with Nutch 0.9 and SOLR. We are looking for some help to migrate this to Nutch 1.0. The work involves: 1) We made changes to Nutch 0.9; these need to be ported to Nutch 1.0. 2) Configure SOLR integration with Nutch 1.0 3)

Re: solr nutch url indexing

2009-08-26 Thread last...@gmail.com
Uri Boness wrote: Well... yes, it's a tool the Nutch ships with. It also ships with an example Solr schema which you can use. hi, is there any documentation to understand what going in the schema ? requestHandler name=/nutch class=solr.SearchHandler lst name=defaults str

Re: solr nutch url indexing

2009-08-26 Thread Uri Boness
Do you mean the schema or the solrconfig.xml? The request handler is configured in the solrconfig.xml and you can find out more about this particular configuration in http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(CategorySolrRequestHandler)|((CategorySolrRequestHandler)). To

Re: solr nutch url indexing

2009-08-25 Thread Thibaut Lassalle
Thanks for your help. I use the default Nutch configuration and I use solrindex to give the Nutch result to Solr. I have results when I query therefore Nutch works properly (it gives a url, title, content ...) I would like to query on Solr to emphase the title field and not the content field.

Re: solr nutch url indexing

2009-08-25 Thread Uri Boness
It seems to me that this configuration actually does what you want - queries on title mostly. The default search field doesn't influence a dismax query. I would suggest you to include the debugQuery=true parameter, it will help you figure out how the matching is performed. You can read more

RE: solr nutch url indexing

2009-08-25 Thread Fuad Efendi
Thanks for the link, so, SolrIndex is NOT plugin, it is an application... I use similar approach... -Original Message- From: Uri Boness Hi, Nutch comes with support for Solr out of the box. I suggest you follow the steps as described here:

Re: solr nutch url indexing

2009-08-25 Thread Uri Boness
Well... yes, it's a tool the Nutch ships with. It also ships with an example Solr schema which you can use. Fuad Efendi wrote: Thanks for the link, so, SolrIndex is NOT plugin, it is an application... I use similar approach... -Original Message- From: Uri Boness Hi, Nutch comes

solr nutch url indexing

2009-08-24 Thread Lassalle, Thibaut
Hi, I would like to crawl intranets with nutch and index them with solr. I would like to search mostly on the title of the pages (the one in titleThis is a title/title) I tried to tweak the schema.xml to do that but nothing is working. I just have the content indexed. How do I

Re: solr nutch url indexing

2009-08-24 Thread Uri Boness
How did you configure nutch? Make sure you have the parse-html and index-basic configured. The HtmlParser should by default extract the page title and add to the parsed data, and the BasicIndexingFilter by default adds this title to the NutchDocument and stores it in the title filed. All the

RE: solr nutch url indexing

2009-08-24 Thread Fuad Efendi
Is SolrIndex plugin for Nutch? Thanks! -Original Message- From: Uri Boness [mailto:ubon...@gmail.com] Sent: August-24-09 4:42 PM To: solr-user@lucene.apache.org Subject: Re: solr nutch url indexing How did you configure nutch? Make sure you have the parse-html and index-basic

NYC Apache Lucene/Solr/Nutch/etc. Meetup

2009-07-03 Thread Grant Ingersoll
Hi All, (sorry for the cross-post) For those in NYC, there will be a Lucene ecosystem (Lucene/Solr/Mahout/ Nutch/Tika/Droids/Lucene ports) Meetup on July 22, hosted by MTV Networks and co-sponsored with Lucid Imagination. For more info and to RSVP, see

Re: Snipets Solr/nutch

2008-04-15 Thread khirb7
to Solr. thank you mike. -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16708645.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snipets Solr/nutch

2008-04-15 Thread Mike Klaas
On 15-Apr-08, at 1:37 PM, khirb7 wrote: Thank you a lot you are helpful, concerning my solr I am using the 1.2.0 version i download it from the Apache download mirror http://www.apache.org/dyn/closer.cgi/lucene/solr/ , I haven't well understand you when you said : you're trying to apply a

Re: Snipets Solr/nutch

2008-04-13 Thread khirb7
=org.apache.solr.highlight.GapFragmenter default=true still use fragsize=100 but i am using int name=hl.fragsize400/int as shown above. thank you. -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16656960.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snipets Solr/nutch

2008-04-10 Thread khirb7
to modify it. all that in order to not return the first word encountered highlighted but to return an other one because of the problem I explained in my previous messages Cheers -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16603642.html Sent from the Solr

Re: Snipets Solr/nutch(maxFragSize?)

2008-04-10 Thread khirb7
to highlight not only the first occurrence of a searched word but up to 1 occurrence of the same word. cheers -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16608806.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snipets Solr/nutch

2008-04-10 Thread Mike Klaas
On 10-Apr-08, at 12:26 AM, khirb7 wrote: hello every body just one other question, to analyse and modify Solr's snippet, I want to know if org.apache.solr.util.HighlightingUtils is the class generating the snippet and which methode generate them, and could you please explain me how are

Snipets Solr/nutch

2008-04-07 Thread khirb7
them to my solr. thank you in advence. -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16537216.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snipets Solr/nutch

2008-04-07 Thread khirb7
attention to the punctuation (the comma or the capital letter) thank you in advence. -- View this message in context: http://www.nabble.com/Snipets-Solr-nutch-tp16537216p16537460.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snipets Solr/nutch

2008-04-07 Thread Mike Klaas
On 7-Apr-08, at 7:12 AM, khirb7 wrote: khirb7 wrote: hello every body I am using solr in my project, and I want to use solr snipets generated by the highlighting. The problem is that these snipets aren't really well displayed, they are trancated and not really meanigful. I heard that