[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2007-01-18 Thread Armel Nene (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465700
 ] 

Armel Nene commented on NUTCH-61:
-

I have attached a new patch as the old one need updating before using with 
Nutch 0.8.1. It will be great if more people can test the feature as I have 
encounter some issues with plugins such the parse-xml when used with this 
patch. Over http protocol the patch works well when indexing text/xml/html. 
When used with a plugins such parse-xml, the fetcher throws a java 
IllegalStateException. If anybody has this error and knows how to fix, please 
share it with the rest of us. As of now, i'm working on trying to fix this 
issue and hoperfully adapt the feature the 0.9.0 version. 

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: https://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch, nutch-61-492176.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2007-01-17 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465493
 ] 

Sami Siren commented on NUTCH-61:
-

Havent looked the patch (tm)

How would one manage segments after something linke this gets included, i mean 
now it's more or less safe to delete segments older than configured refetch 
interval + some marginal,  but after the lifetime of page can vary there's no 
more such a simple way to manage fetched data.


 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: https://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2007-01-17 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465517
 ] 

Andrzej Bialecki  commented on NUTCH-61:


Actually, there is a way to do this, and this patch implements it.

We define a maximum time to live for _any_ page, no matter when it was last 
fetched or what is its re-fetch interval. This is a system-wide setting. If 
re-fetch interval is longer than this value, or somehow the page wasn't 
re-fetched at least that long for other reasons (e.g. because it was 
unmodified, and we don't fetch unmodified content) - such pages will be 
forcefully included in fetchlist candidates as if they had DB_UNFETCHED status.

This means we can be sure that any pages still present in segments older than 
this maximum TTL will have been refetched, and we can safely discard all 
segments older than TTL.

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: https://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2007-01-17 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465540
 ] 

Sami Siren commented on NUTCH-61:
-

ok, so in my usual use case where there are far more urls than I can fetch this 
shouldn't have any effect at all negative or positive.

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: https://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-11-12 Thread Armel Nene (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12449128 ] 

Armel Nene commented on NUTCH-61:
-

Has this patch by any chance been included in the newer release of nucth or is 
any one using as Otis asked. The reason is I am about to build a similar patch 
but if this patch is already working, I can just adapt it to my context. Or 
will nutch in the future planning to provide this feature out of the box? 

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: http://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-11-12 Thread Andrzej Bialecki (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12449170 ] 

Andrzej Bialecki  commented on NUTCH-61:


Unfortunately, this patch hasn't been applied yet, due to its complexity and 
lack of testing.

But it will be, sooner or later, because this functionality is required for any 
serious use.

I'm planning to bring this patch to the latest trunk, and then apply it 
piece-wise over the next couple of weeks.

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: http://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


Re: [Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-11-12 Thread Armel T. Nene
Andrzej, the feature that I am after can be implemented by this patch if I
just adapt it right. I am not sure of this but the patch seems a little bit
old to be implemented in the latest release of Nutch 0.8.1. 

I want to implement a feature where the fetcher will fetch files but only
add them if there have been modified after the latest fetch time. Now, I
want to implement that on a filesystem first and then update later for
network fetching. I would like to have a look at your full source code for
your patch in a zip file if possible. Once the feature implemented, I will
post it back here. I'd like to start working from your code first. You can
either make the source code available here or mail them to me at armel dot
nene @ idna-solutions dot com.


-Original Message-
From: Andrzej Bialecki (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: 12 November 2006 19:39
To: nutch-dev@lucene.apache.org
Subject: [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting
umodified content

[
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12449170
] 

Andrzej Bialecki  commented on NUTCH-61:


Unfortunately, this patch hasn't been applied yet, due to its complexity and
lack of testing.

But it will be, sooner or later, because this functionality is required for
any serious use.

I'm planning to bring this patch to the latest trunk, and then apply it
piece-wise over the next couple of weeks.

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: http://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt,
nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no
matter if individual pages change seldom or frequently. The goal of these
changes is to extend the current codebase to support various possible
adjustments to re-fetch times and intervals, and specifically a re-fetch
schedule which tries to adapt the period between consecutive fetches to the
period of content changes.
 Also, these patches implement checking if the content has changed since
last fetching; protocol plugins are also changed to make use of this
information, so that if content is unmodified it doesn't have to be fetched
and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


Re: [Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-11-12 Thread Andrzej Bialecki
Armel T. Nene wrote:
 Andrzej, the feature that I am after can be implemented by this patch if I
 just adapt it right. I am not sure of this but the patch seems a little bit
 old to be implemented in the latest release of Nutch 0.8.1. 
   

Right, that's why I wrote it needs to be brought up-to-date with the 
current trunk/ .

 I want to implement a feature where the fetcher will fetch files but only
 add them if there have been modified after the latest fetch time. Now, I
 want to implement that on a filesystem first and then update later for
 network fetching. I would like to have a look at your full source code for
 your patch in a zip file if possible. Once the feature implemented, I will
 post it back here. I'd like to start working from your code first. You can
 either make the source code available here or mail them to me at armel dot
 nene @ idna-solutions dot com.
   

Patches attached to the JIRA issue already support this. Please bear in 
mind that the notion of change is dependent on how you compare the 
content of old and new pages, especially if you lack the Last-Modified 
header from the server.


-- 
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-10-24 Thread Otis Gospodnetic (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12444514 ] 

Otis Gospodnetic commented on NUTCH-61:
---

Has anyone been using the code with this patch applied?  Just wondering if/how 
well it works.

 Adaptive re-fetch interval. Detecting umodified content
 ---

 Key: NUTCH-61
 URL: http://issues.apache.org/jira/browse/NUTCH-61
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
 nutch-61-417287.patch


 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-02-27 Thread Jerome Charron (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12368050 ] 

Jerome Charron commented on NUTCH-61:
-

Not an objection, but a simple comment.
Why not making FetchSchedule a new ExtensionPoint and then DefaultFetchSchedule 
and AdaptiveFetchSchedule some fetch schedule plugins?


 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff, 20051230.txt, 20060227.txt

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2006-02-27 Thread Andrzej Bialecki (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12368051 ] 

Andrzej Bialecki  commented on NUTCH-61:


I contemplated this for a while, and then decided against it.

The main reason was that currently most of the pluggable extensions that 
result in running a single selected plugin are handled using a simple Factory 
pattern; as opposed to ChainedFilter pattern, where we use extension points. 

I guess the original reason was that implementations would almost always 
consist of a single class, so it didn't make sense to complicate it and require 
the whole plugin infrastructure ... It would be the same in this case (just a 
single class), so I followed the same pattern.

It's easy to change this to use an extension point, if people prefer it this 
way.

 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff, 20051230.txt, 20060227.txt

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-28 Thread byron miller (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361346 ] 

byron miller commented on NUTCH-61:
---

Most definately! I'll be happy to give it a whirl!

 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-27 Thread byron miller (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361302 ] 

byron miller commented on NUTCH-61:
---

Is there a patch modified for the current branch or should i take a stab at 
this?

 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-27 Thread Andrzej Bialecki (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361311 ] 

Andrzej Bialecki  commented on NUTCH-61:


I'm working on this, the patch will be available in a couple of days. I could 
use then your help with review and testing... ;-)

 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-22 Thread raghavendra prabhu (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361131 ] 

raghavendra prabhu commented on NUTCH-61:
-

Will the same thing work for a filesystem

For a file system , We can directly get the modified date store it in the db

The plugins will have a look at the content date and if it is different they 
will index it 

Otherwise they will not fetch it 

This can be a solution for file based content 

(The thing is it does away entirely with fetch interval and takes decision only 
based upon file modification date)

 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers


[Nutch-dev] [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-22 Thread Andrzej Bialecki (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361133 ] 

Andrzej Bialecki  commented on NUTCH-61:


This patch already supports this. Anyway, it needs to be significantly 
re-worked to fit into the current development version.

 Adaptive re-fetch interval. Detecting umodified content
 ---

  Key: NUTCH-61
  URL: http://issues.apache.org/jira/browse/NUTCH-61
  Project: Nutch
 Type: New Feature
   Components: fetcher
 Reporter: Andrzej Bialecki 
 Assignee: Andrzej Bialecki 
  Attachments: 20050606.diff

 Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
 if individual pages change seldom or frequently. The goal of these changes is 
 to extend the current codebase to support various possible adjustments to 
 re-fetch times and intervals, and specifically a re-fetch schedule which 
 tries to adapt the period between consecutive fetches to the period of 
 content changes.
 Also, these patches implement checking if the content has changed since last 
 fetching; protocol plugins are also changed to make use of this information, 
 so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers