Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it's
becoming too much. What might help is that incrediBILL would offer an
index of
I using dfs. My index contain 3706249 documents. Presently, searching for
occupies from 2 before 4 seconds (I test on query with 3 search term).
Tomcat started on box with cpu Dual Opteron 2.4 GHz and 16 GB Ram. I think
search is very slow now.
We can make search faster?
What factors influence
Hi,
DFS is too slow for the search.
What we did, was extracted the segments to the local FS i.e. to the hard
disk. Each machine has 2X300GB HD in raid.
Bin/hadoop dfs -get index /nutch/index
Bin/hadoop dfs -get linkdb /nutch/linkdb
Bin/hadoop dfs -get segments /nutch/segments
When we run out
In my company we changed the default and many other probably did the same.
However, we must not ignore the behavior of the irresponsible users of
Nutch. And for that reason the use of the default must be blocked in code.
Just my 2 cents.
-Original Message-
From: Michael Wechner
尊敬的客户:
这是一封善意的邮件,如有打扰,请原谅!
我是深圳市商发实业有限公司,我公司
有余额发票可以代开(增值税票,商品销售发票,建筑安装发票
其它服务行业发票. 广告专用发票,运输票.),
如有需要请同本公司联系;欢迎来电来邮咨询。
联系电话: 13530663132
联系人:王先生
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ]
Sami Siren reassigned NUTCH-306:
Assign To: Sami Siren
DistributedSearch.Client liveAddresses concurrency problem
--
Key:
[ http://issues.apache.org/jira/browse/NUTCH-122?page=all ]
Sami Siren resolved NUTCH-122:
--
Resolution: Invalid
this is more related to hadoop
block numbers need a better random number generator
---
[ http://issues.apache.org/jira/browse/NUTCH-187?page=all ]
Sami Siren closed NUTCH-187:
Resolution: Won't Fix
closed as requested
Cannot start Nutch datanodes on Windows outside of a cygwin environment
because of DF
贵公司经理财务 您好:
深圳市德志诚贸易有限公司代开国税普通票.增值税.运输票.地税票.广告设计.
服务行业.建筑安装等、、详请来电咨询 如有打搅请谅解 多谢
联 糸 人: 刘 先 生 顺 祝
商 祺
电话: 13926533593
___
Nutch-developers mailing
I think that Nutch has to solve the problem: if you leave the problem to the
websites, they're more likely to cut you off than they are to implement
their own index storage scheme. Besides, they'd get it wrong, have stale
data, etc.
Maybe what is needed is brainstorming on a shared crawling
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
您好
本公司享有税收优/惠政策,长期与国内各省市多家企业合/作,
在报税、做帐方面积累有丰富的经验,公司本着互惠互利的原则合/作,
现在推出代/开发/票的业务:我司代理的行业广泛,有普通国税、
运输、建筑、广告、服务业等,税率特低,所用绝对真票,可先开具票
查验后再付款, (真诚希望与您合/作!)
手机:13824313182陈先生
QQ:372749963
[EMAIL PROTECTED]
___
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
13 matches
Mail list logo