[jira] [Resolved] (NUTCH-1471) make explicit which datastore urls are injected to

2012-11-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1471. - Resolution: Implemented Assignee: Lewis John McGibbney Committed @revision

[jira] [Updated] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again

2012-11-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1245: Patch Info: Patch Available URL gone with 404 after db.fetch.interval.max

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-11-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502813#comment-13502813 ] Hudson commented on NUTCH-1370: --- Integrated in nutch-trunk-maven #503 (See

GeneratorMapper - how to re-generate webpage?

2012-11-22 Thread vetus
Hello, I have a doubt, I have clrawled all my webside using a java that does the five steps inject/generate/fetch/parse/update using InjectorJob, GeneratorJob, etc... When it has indexed all the website, then I want to re-crawl some pages again (Because it has changed), but as I understand,

What means an * at the begining of batch mark?

2012-11-22 Thread vetus
Today I have seen that in my Mysql database of nuth, some webpages has an * in the markers field. What does it mean? e.g. dist2_updmrk_*1353585885-1120331892__prsmrk__*1353585885-1120331892_gnmrk_*1353585885-1120331892_ftcmrk_*1353585885-1120331892

Re: [DISCUSS] trunk release?

2012-11-22 Thread Mattmann, Chris A (388J)
Release early, release often :) I'd say I'd be happy to try and spin it, but you'd beat me to it so I just will say I'll be happy to test the RC and voice my VOTE when you roll it Lewis :) Happy Thanksgiving (even though you're not in the States yet)! Cheers, Chris On Nov 22, 2012, at 7:15

Re: [DISCUSS] trunk release?

2012-11-22 Thread Lewis John Mcgibbney
Hi Chris, On Thu, Nov 22, 2012 at 5:43 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Release early, release often :) +1 Happy Thanksgiving (even though you're not in the States yet)! Aye, it should have been the other way around. Don't worry it will be Burns Night on

Re: What means an * at the begining of batch mark?

2012-11-22 Thread Lewis John Mcgibbney
Hi Vetus, On Thu, Nov 22, 2012 at 12:15 PM, vetus ve...@isac.cat wrote: dist 2 _updmrk_*1353585885-1120331892 __prsmrk__*1353585885-1120331892 _gnmrk_*1353585885-1120331892 _ftcmrk_*1353585885-1120331892 Absolutely no idea... Does this matter to you? Have you got some problems? It seems

Re: GeneratorMapper - how to re-generate webpage?

2012-11-22 Thread Lewis John Mcgibbney
Hi Vetus, On Thu, Nov 22, 2012 at 12:00 PM, vetus ve...@isac.cat wrote: When it has indexed all the website, then I want to re-crawl some pages again (Because it has changed), For starters I would advise you to use the adaptive fetch schedule [0], this can be configured from within the

RE: [DISCUSS] trunk release?

2012-11-22 Thread Markus Jelsma
-Original message- From:Lewis John Mcgibbney lewis.mcgibb...@gmail.com Sent: Thu 22-Nov-2012 19:23 To: dev@nutch.apache.org Subject: Re: [DISCUSS] trunk release? Hi Chris, On Thu, Nov 22, 2012 at 5:43 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote:

Re: [DISCUSS] trunk release?

2012-11-22 Thread Sebastian Nagel
+1 to release Now we can hold the 6-month cycle. Chris is right: If we manage to address a couple of the critical issues early next year, we can release earlier. Sebastian On 11/22/2012 06:43 PM, Mattmann, Chris A (388J) wrote: Release early, release often :) I'd say I'd be happy to try and

Re: [DISCUSS] trunk release?

2012-11-22 Thread Lewis John Mcgibbney
OK I will pop an RC today Thank you for the input Lewis On Thu, Nov 22, 2012 at 8:43 PM, Sebastian Nagel wastl.na...@googlemail.com wrote: +1 to release Now we can hold the 6-month cycle. Chris is right: If we manage to address a couple of the critical issues early next year, we can

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-11-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503012#comment-13503012 ] Hudson commented on NUTCH-1370: --- Integrated in Nutch-nutchgora #412 (See

[jira] [Commented] (NUTCH-1370) Expose exact number of urls injected @runtime

2012-11-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503013#comment-13503013 ] Hudson commented on NUTCH-1370: --- Integrated in Nutch-trunk #2026 (See