Re: [Nutch-dev] how to prune unmatched url??

2007-04-24 Thread franklinb4u
hi, I ve downloaded apache-ant-1.7.0 version... the idea is to compile the nutch source code.. and i ve placed in my nutch directory.. does this means the installation of ant is over...? or is there any steps to be followed... if so kindly tell me the steps which i have to follow to compile the

Re: [Nutch-dev] how to prune unmatched url??

2007-04-24 Thread Ratnesh,V2Solutions India
y don't u compile nutch in eclipse if you are working in windows enviornment, then u need not to download ant . if you can proceed with that then i can explain you rest. in linux i have worked only till deployment and not done any testing and running of nutch source code. Thanks

Re: [Nutch-dev] how to prune unmatched url??

2007-04-24 Thread franklinb4u
i guess java program can be compiled once and then it can be run anywhere... so once compiled in widows and then if that package can be used in Unix,then explain me the further steps.. so if its possible to compile the code in eclipse,then please tell me how to do.. i don have any idea abt

[Nutch-dev] [jira] Created: (NUTCH-470) Adding optional terms to a query

2007-04-24 Thread Trond Andersen (JIRA)
Adding optional terms to a query Key: NUTCH-470 URL: https://issues.apache.org/jira/browse/NUTCH-470 Project: Nutch Issue Type: Wish Components: searcher Affects Versions: 0.9.0

[Nutch-dev] [jira] Updated: (NUTCH-470) Adding optional terms to a query

2007-04-24 Thread Trond Andersen (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trond Andersen updated NUTCH-470: - Attachment: optional.patch A small patch making it possible to add optional terms to the Query

Re: [Nutch-dev] Perfomance problems and segmenting

2007-04-24 Thread JoostRuiter
Ok thanks for all your input guys! I`ll discuss this with my co-worker. Dennis, what more information do you need? Thanks everyone! Briggs wrote: One more thing... Are you using a distributed index? If this is so, you do not want to do this; indexes should be local to the machine that

[Nutch-dev] [jira] Created: (NUTCH-471) Fix synchronization in NutchBean creation

2007-04-24 Thread Enis Soztutar (JIRA)
Fix synchronization in NutchBean creation - Key: NUTCH-471 URL: https://issues.apache.org/jira/browse/NUTCH-471 Project: Nutch Issue Type: Bug Components: searcher Affects Versions:

[Nutch-dev] [jira] Updated: (NUTCH-471) Fix synchronization in NutchBean creation

2007-04-24 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated NUTCH-471: Attachment: NutchBeanCreationSync_v1.patch this patch synchronizes NutchBean.get((ServletContext

Re: [Nutch-dev] Perfomance problems and segmenting

2007-04-24 Thread JoostRuiter
Hey guys, one more addition, we're not using DFS. We got a single XP box with NFTS (so no distributed index). Hope this helps, greetings.. JoostRuiter wrote: Ok thanks for all your input guys! I`ll discuss this with my co-worker. Dennis, what more information do you need? Thanks

[Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Doğacan Güney
Hi all, I have been working on Fetcher2 code lately and I came across this particular code (in FetchItemQueue.getFetchItem) that I didn't quite understand: public FetchItem getFetchItem() { ... long last = endTime.get() + (maxThreads 1 ? crawlDelay : minCrawlDelay); ... } Now, the

Re: [Nutch-dev] Perfomance problems and segmenting

2007-04-24 Thread JoostRuiter
I got some additional info from our developer: I never had much luck with the merge tools but you might post this snippit from your log to the board: 2007-04-23 20:01:56,656 INFO segment.SegmentMerger - Slice size: 5 URLs. 2007-04-23 20:01:56,656 INFO segment.SegmentMerger - Slice size:

[Nutch-dev] [jira] Created: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file

2007-04-24 Thread Antony Bowesman (JIRA)
NullPointerException in ZipTextExtractor if no MIME type for zipped file Key: NUTCH-472 URL: https://issues.apache.org/jira/browse/NUTCH-472 Project: Nutch Issue Type:

[Nutch-dev] [jira] Created: (NUTCH-473) ExcepExtractor performance bad due to String concatenation

2007-04-24 Thread Antony Bowesman (JIRA)
ExcepExtractor performance bad due to String concatenation -- Key: NUTCH-473 URL: https://issues.apache.org/jira/browse/NUTCH-473 Project: Nutch Issue Type: Improvement

[Nutch-dev] [jira] Updated: (NUTCH-473) ExcelExtractor performance bad due to String concatenation

2007-04-24 Thread Antony Bowesman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antony Bowesman updated NUTCH-473: -- Summary: ExcelExtractor performance bad due to String concatenation (was: ExcepExtractor

Re: [Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Doğacan Güney
I have discovered another bug in Fetcher2. Plugin lib-http checks Protocol.CHECK_{BLOCKING,ROBOTS}(which resolve to strings protocol.plugin.check.{blocking,robots}) to see if it should handle blocking or not. But fetcher2 sets http.plugin.check.{blocking,robots} (notice the protocol/http

Re: [Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Andrzej Bialecki
Doğacan Güney wrote: Hi all, I have been working on Fetcher2 code lately and I came across this particular code (in FetchItemQueue.getFetchItem) that I didn't quite understand: public FetchItem getFetchItem() { ... long last = endTime.get() + (maxThreads 1 ? crawlDelay :

Re: [Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Andrzej Bialecki
Doğacan Güney wrote: I have discovered another bug in Fetcher2. Plugin lib-http checks Protocol.CHECK_{BLOCKING,ROBOTS}(which resolve to strings protocol.plugin.check.{blocking,robots}) to see if it should handle blocking or not. But fetcher2 sets http.plugin.check.{blocking,robots}

[Nutch-dev] [jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation

2007-04-24 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491290 ] Andrzej Bialecki commented on NUTCH-471: - +1. Nice trick with the unsynchronized check. :) Fix

Re: [Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Doğacan Güney
On 4/24/07, Andrzej Bialecki [EMAIL PROTECTED] wrote: Doğacan Güney wrote: Hi all, I have been working on Fetcher2 code lately and I came across this particular code (in FetchItemQueue.getFetchItem) that I didn't quite understand: public FetchItem getFetchItem() { ... long

[Nutch-dev] [jira] Created: (NUTCH-474) Fetcher2 sets server-delay and blocking checks incorrectly

2007-04-24 Thread JIRA
Fetcher2 sets server-delay and blocking checks incorrectly -- Key: NUTCH-474 URL: https://issues.apache.org/jira/browse/NUTCH-474 Project: Nutch Issue Type: Bug Components:

[Nutch-dev] [jira] Updated: (NUTCH-474) Fetcher2 sets server-delay and blocking checks incorrectly

2007-04-24 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-474: Attachment: fetcher2.patch Fetcher2 sets server-delay and blocking checks incorrectly

[Nutch-dev] [jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation

2007-04-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491305 ] Sami Siren commented on NUTCH-471: -- Isn't the DCL declared to be broken? We could perhaps instead instantiate

[Nutch-dev] [jira] Resolved: (NUTCH-473) ExcelExtractor performance bad due to String concatenation

2007-04-24 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-473. -- Resolution: Duplicate duplicate of NUTCH-456 ExcelExtractor performance bad due to String

Re: [Nutch-dev] modifications to geoPosition plugin to get it working on nutch 0.9

2007-04-24 Thread Sami Siren
Mike Schwartz wrote: I have modified the geoPosition plugin (http://wiki.apache.org/nutch/GeoPosition) code to work with nutch 0.9. (The code was built originally using nutch 0.7.) I'd like to contribute my changes to the nutch project. I already communicated with the code's author

[Nutch-dev] [jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation

2007-04-24 Thread Enis Soztutar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491313 ] Enis Soztutar commented on NUTCH-471: - Nice trick with the unsynchronized check. :) Wow, indeed i have used a

Re: [Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Andrzej Bialecki
Doğacan Güney wrote: I don't get it. The code seems to do exactly the opposite of what you are saying. If maxThreads == 1 then maxThreads 1 is false thus the expression evaluates to minCrawlDelay not crawlDelay. Shouldn't the expression be (maxThreads 1 ? minCrawlDelay : crawlDelay) ? Yep,

Re: [Nutch-dev] Fetcher2's delay between successive requests

2007-04-24 Thread Doğacan Güney
On 4/24/07, Andrzej Bialecki [EMAIL PROTECTED] wrote: Doğacan Güney wrote: I don't get it. The code seems to do exactly the opposite of what you are saying. If maxThreads == 1 then maxThreads 1 is false thus the expression evaluates to minCrawlDelay not crawlDelay. Shouldn't the

[Nutch-dev] [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9

2007-04-24 Thread Mike Schwartz (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Schwartz updated NUTCH-469: Attachment: geoPosition0.6_cdiff.zip I've attached the contenxt diff from geoPosition 0.5 that I'm

Re: [Nutch-dev] modifications to geoPosition plugin to get it working on nutch 0.9

2007-04-24 Thread Mike Schwartz
ok, thanks - I've attached the zipped context diff to the Jira ticket. Please let me know if you have any problems with this - Mike At 08:57 AM 4/24/2007, Sami Siren wrote: Mike Schwartz wrote: I have modified the geoPosition plugin (http://wiki.apache.org/nutch/GeoPosition) code to work

Re: [Nutch-dev] Creating a new scoring filter

2007-04-24 Thread Lorenzo
Very briefly, with an HtmlParseFilter and a list of weighted words. This filter examines the Parse text and add a boost value if it finds one of the words in the list. This boost value is added to ParseData MetaData. Then, a ScoringPlugin reads this MetaData (passScoreAfterParsing) and update

[Nutch-dev] [jira] Closed: (NUTCH-474) Fetcher2 sets server-delay and blocking checks incorrectly

2007-04-24 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-474. --- Resolution: Fixed Assignee: Andrzej Bialecki Fixed in rev. 532088. Thanks! Fetcher2

[Nutch-dev] 商业合作

2007-04-24 Thread 周先生
致公司财务/经理:您好! 深圳市讯通实业有限公司(全国各大中城市均有分公司) 公司本着互惠互 利的原则合理对外代开发票.代开范围:(商品销售、广告、电脑版运输发票、 其它服务、租赁、建筑安装、餐饮定额发票等)税率1.5%左右代开。 贵公司在做帐或进销存方面如需用到的话,我司可提供全方面的服务。可 根据所做数量额度的大小来衡量优惠的点数。欢迎来电咨询!郑重承诺所用票 据均可上网查询验证后付款! 联系人:周先生联系电话:13928442060 E- MAIL [EMAIL PROTECTED]

[Nutch-dev] Exchange news.

2007-04-24 Thread Eoin Krickemeyer
with - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/

[Nutch-dev] Hot news.

2007-04-24 Thread Raye Hilden
is - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/