Re: [Nutch-dev] IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-15 Thread Michael Wechner
Doug Cutting wrote: http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html well, I think incrediBILL has an argument, that people might really start excluding bots from their servers if it's becoming too much. What might help is that incrediBILL would offer an index of

[Nutch-dev] search speed

2006-06-15 Thread anton
I using dfs. My index contain 3706249 documents. Presently, searching for occupies from 2 before 4 seconds (I test on query with 3 search term). Tomcat started on box with cpu Dual Opteron 2.4 GHz and 16 GB Ram. I think search is very slow now. We can make search faster? What factors influence

Re: [Nutch-dev] search speed

2006-06-15 Thread Gal Nitzan
Hi, DFS is too slow for the search. What we did, was extracted the segments to the local FS i.e. to the hard disk. Each machine has 2X300GB HD in raid. Bin/hadoop dfs -get index /nutch/index Bin/hadoop dfs -get linkdb /nutch/linkdb Bin/hadoop dfs -get segments /nutch/segments When we run out

Re: [Nutch-dev] IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-15 Thread Gal Nitzan
In my company we changed the default and many other probably did the same. However, we must not ignore the behavior of the irresponsible users of Nutch. And for that reason the use of the default must be blocked in code. Just my 2 cents. -Original Message- From: Michael Wechner

[Nutch-dev] 内容

2006-06-15 Thread 江元集团
尊敬的客户: 这是一封善意的邮件,如有打扰,请原谅! 我是深圳市商发实业有限公司,我公司 有余额发票可以代开(增值税票,商品销售发票,建筑安装发票 其它服务行业发票. 广告专用发票,运输票.), 如有需要请同本公司联系;欢迎来电来邮咨询。 联系电话: 13530663132 联系人:王先生

[Nutch-dev] [jira] Assigned: (NUTCH-306) DistributedSearch.Client liveAddresses concurrency problem

2006-06-15 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ] Sami Siren reassigned NUTCH-306: Assign To: Sami Siren DistributedSearch.Client liveAddresses concurrency problem -- Key:

[Nutch-dev] [jira] Resolved: (NUTCH-122) block numbers need a better random number generator

2006-06-15 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-122?page=all ] Sami Siren resolved NUTCH-122: -- Resolution: Invalid this is more related to hadoop block numbers need a better random number generator ---

[Nutch-dev] [jira] Closed: (NUTCH-187) Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF

2006-06-15 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-187?page=all ] Sami Siren closed NUTCH-187: Resolution: Won't Fix closed as requested Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF

[Nutch-dev] 您好:

2006-06-15 Thread sacd
贵公司经理财务 您好: 深圳市德志诚贸易有限公司代开国税普通票.增值税.运输票.地税票.广告设计. 服务行业.建筑安装等、、详请来电咨询 如有打搅请谅解 多谢 联 糸 人: 刘 先 生 顺 祝 商 祺 电话: 13926533593 ___ Nutch-developers mailing

Re: [Nutch-dev] IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-15 Thread Paul Sutter
I think that Nutch has to solve the problem: if you leave the problem to the websites, they're more likely to cut you off than they are to implement their own index storage scheme. Besides, they'd get it wrong, have stale data, etc. Maybe what is needed is brainstorming on a shared crawling

[Nutch-dev] meager

2006-06-15 Thread Dick Hartman
___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] ad dkfp

2006-06-15 Thread sziuyt
您好    本公司享有税收优/惠政策,长期与国内各省市多家企业合/作, 在报税、做帐方面积累有丰富的经验,公司本着互惠互利的原则合/作, 现在推出代/开发/票的业务:我司代理的行业广泛,有普通国税、 运输、建筑、广告、服务业等,税率特低,所用绝对真票,可先开具票   查验后再付款, (真诚希望与您合/作!) 手机:13824313182陈先生    QQ:372749963 [EMAIL PROTECTED] ___

[Nutch-dev] peel mumble

2006-06-15 Thread Emmanuel Rich
___ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers