[Nutch-dev] parse-rss test problem

2007-01-25 Thread kauu
I can't test my parse-rss pluging in the nutch-0.8.1 I just can't test the default rsstest.rss file. 2007-01-25 17:04:34,703 INFO conf.Configuration (Configuration.java:getConfResourceAsInputStream(340)) - found resource parse-plugins.xml at

Re: [Nutch-dev] Fetcher2

2007-01-25 Thread kauu
please give us the url,thx On 1/25/07, chee wu [EMAIL PROTECTED] wrote: Just appended the portion for .81 to NUTCH-339 - Original Message - From: Armel T. Nene [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, January 25, 2007 8:06 AM Subject: RE: Fetcher2 Chee,

Re: [Nutch-dev] Fetcher2

2007-01-25 Thread Armel T. Nene
Kauu, The url for fetcher too is: https://issues.apache.org/jira/browse/NUTCH-339 Armel - Armel T. Nene iDNA Solutions Tel: +44 (207) 257 6124 Mobile: +44 (788) 695 0483 http://blog.idna-solutions.com -Original Message- From: kauu

[Nutch-dev] Modified date in crawldb

2007-01-25 Thread Armel T. Nene
Hi guys, I am using Nutch 0.8.2-dev. I have notice that the crawldb does not actually save the last modified date of files. I have run a crawl on my local file system and the web. When I dumped the content of crawldb for both crawl, the modified date of the files were set to 01-Jan-1970

Re: [Nutch-dev] Modified date in crawldb

2007-01-25 Thread Andrzej Bialecki
Armel T. Nene wrote: Hi guys, I am using Nutch 0.8.2-dev. I have notice that the crawldb does not actually save the last modified date of files. I have run a crawl on my local file system and the web. When I dumped the content of crawldb for both crawl, the modified date of the files

Re: [Nutch-dev] Modified date in crawldb

2007-01-25 Thread Armel T. Nene
Chee, Have you successfully applied Nutch-61 to Nutch 0.8.1. I worked on the version, was able to apply fully but not entirely successful in running with the XML parser plugin. If you have applied successfully let me know. Regards, Armel - Armel

[Nutch-dev] threads-safe methods in Nutch

2007-01-25 Thread Armel T. Nene
Hi guys, I know it's me again. I have been testing Nutch robustly lately and here some threads issues that I found. I am running version 0.8.2-dev. When Nutch is initially run (either from script or ANT), it has a default of 10 threads for the fetcher. This is actually good for performance

[Nutch-dev] [jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Brian Whitman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467471 ] Brian Whitman commented on NUTCH-433: - This is still not fixed in the latest nightly --

[Nutch-dev] [jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467478 ] Andrzej Bialecki commented on NUTCH-433: - Nutch and Hadoop are separate projects, with the latter evolving

[Nutch-dev] [jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Brian Whitman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467486 ] Brian Whitman commented on NUTCH-433: - OK, understand. But the nutch nightly should at least include a version of

Re: [Nutch-dev] i18n in nutch home page is misnomor

2007-01-25 Thread Doug Cutting
Teruhiko Kurosaka wrote: I suggest i18n be renamed to l10n, short for localization. Can you please file an issue in Jira for this? Ideally you could even provide a patch. The source for the website is in subversion at: http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/site Forrest is

Re: [Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Doug Cutting
Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter static constructor ... FYI, Hadoop no longer does this. Doug - Take

[Nutch-dev] [jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467491 ] Sami Siren commented on NUTCH-433: -- ok, now it is committed, sorry. java.io.EOFException in newer nightlies in

Re: [Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). So this raises an interesting question: People (such as Scott G.) out there -- are you folks

Re: [Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
Hi Doug, So, does this render the patch that I wrote obsolete? Cheers, Chris On 1/25/07 10:08 AM, Doug Cutting [EMAIL PROTECTED] wrote: Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter

Re: [Nutch-dev] [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Doug Cutting
Chris Mattmann wrote: So, does this render the patch that I wrote obsolete? It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). Doug

[Nutch-dev] 代办税票

2007-01-25 Thread 代办税票
财务(经理)您好! 我公司是一家正常纳税的A级企业.公司在多年努力奋斗下已发展到全国各地区. 现已向广大客户对营业税收方面可以解决.可以帮广大客户代理代开发票 代开项目分别有: 一: 国税发票 1:商业销售(可以网上查) 2:货物统一销售 3:工业(企业)销售4: 废旧物资 二: 地税发票 1:运输(公路内河运输、货运代理、装卸、联运、海运等) 2:其它服务(广告费、住宿费、会议费、咨询费等) 3:建筑安装 、建材等 4:加工修理等等专用票据 。如需敬请致电: 手 机: 13826592593 联系人: 刘先生

Re: [Nutch-dev] Modified date in crawldb

2007-01-25 Thread chee wu
Armel, Sorry,I haven't tried this patch yet.. - Original Message - From: Armel T. Nene [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, January 25, 2007 11:07 PM Subject: RE: Modified date in crawldb Chee, Have you successfully applied Nutch-61 to Nutch 0.8.1. I

[Nutch-dev] parse-rss make them items as different pages

2007-01-25 Thread kauu
冰雪。1月24日,工作人员在德国南部的慕尼黑机场清扫飞机跑道上的积雪。 据报道,迟来的暴风雪连续两天横扫中... /description linkhttp://news.sohu.com/20070125/n247833568.shtml/link category搜狐焦点图新闻/category author[EMAIL PROTECTED]/author pubDateThu, 25 Jan 2007 11:29:11 +0800/pubDate commentshttp://comment.news.sohu.com/comment

[Nutch-dev] 专业代开发票

2007-01-25 Thread 张天思
尊敬的领导/财务 您好: 本公司专业代开发票,品种齐全,欢迎来电咨询; 联系电话:15914363151 传真:020-39626241 联系人:张先生 [EMAIL PROTECTED] 公司名称:广州市天思税务代理有限公司 公司地址:广州市天河区天寿路188号 - Take Surveys. Earn Cash. Influence the Future