你好!
本公司从事税务代理,有国税、地税发票可优惠对外代开,所开出发票均可税务
验证抵扣后付款,有意者致电:13928434892 杨茂林(先生)
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
Hello!!!
-Original Message-
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent: Sunday, June 04, 2006 9:15 PM
To: nutch-dev@lucene.apache.org
Subject: search engine spam detector
Hi,
a interesting tool:
http://tool.motoricerca.info/spam-detector/
Stefan
My Nutch processed pages
http://www.abc-internet.net/lavinia-lingerie/Lingerie.htm and
http://www.abc-internet.net/pamperedpassions-pampered_passions/Lingerie.htm.
When I try make search for search term lingerie nutch bring up results
with bad summary (... Lingerie, Lingerie, Lingerie,
[EMAIL PROTECTED] wrote:
My Nutch processed pages
http://www.abc-internet.net/lavinia-lingerie/Lingerie.htm and
http://www.abc-internet.net/pamperedpassions-pampered_passions/Lingerie.htm.
When I try make search for search term lingerie nutch bring up results
with bad summary (... Lingerie,
Hello,
In your pages, you find the next Text:
body topmargin=0
!-- Lingerie , Lingerie , Lingerie , Lingerie , Lingerie , Lingerie , Lingerie
, Lingerie , Lingerie , Lingerie , Lingerie , Lingerie , Lingerie , Lingerie ,
Lingerie , Lingerie , Lingerie , Lingerie , Lingerie , Lingerie , Lingerie
Hi Everyone,
I am using MapReduce and DFS for a crawl + index operation. When parsing
relatively small
segments (about 50,000 - 60,000 URLs), everything goes fine. But, when I try
to parse a larger segment
(600,000 - 700,000 URLs), my job is stopped by OutOfMemoryError at
tasktrackers during the
Stefan Groschupf wrote:
The idea to have
someething like this as a nutch-module (dropping pages or ranking them
very low) might come up :-)
This will be a very long way.
I collect some thoughts and a list of web spam related papers in my blog.
[
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414762 ]
Scott Ganyo commented on NUTCH-258:
---
For the record: I strongly object to closing this issue for the following
reasons:
1) Having a *side-effect* of the entire system stop
[
http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414763 ]
Stefan Groschupf commented on NUTCH-258:
Scott,
I agree with you. However we need a clean patch to solve the problem, we can
not just comment things out of the code.
It emulates a feature with same name from google appliance.
http://www.google.com/enterprise/mini/end_user_features.html
--
Sami Siren
[EMAIL PROTECTED] wrote:
Hi,
What exactly does this plugin do? I haven't seen it mentioned and the
README.txt doesn't really describe it.
Thanks,
Otis
Sami Siren wrote:
It emulates a feature with same name from google appliance.
http://www.google.com/enterprise/mini/end_user_features.html
Are you sure there is no trademark infringement here? Perhaps we should
call it something else, just to avoid any potential legal unpleasantries ...
--
Folks,
Before I (or someone else) reopens the issue, I think it's important to
understand the implications:
1) Having a *side-effect* of the entire system stop processing after merely
logging a message at a certain event level is a poor practice.
I'm not sure that the Fetcher quitting is a *
Clustering API improvements
---
Key: NUTCH-300
URL: http://issues.apache.org/jira/browse/NUTCH-300
Project: Nutch
Type: Improvement
Versions: 0.8-dev
Reporter: Andrzej Bialecki
Priority: Minor
This patch adds support for
您好:
本公司因进项较多完成不了每月定税额度,为减少损失本公司现有部分结余普通发票可优惠对外代开,
代开范围:商品销售发票,广告发票,运输发票,其它服务发票,餐饮发票,建筑安装发票等, 本公司郑重承诺所用票据均为各单位
在税务局所申领,可上网查询或到税务局抵扣验证,普通发票收取2%,增值税收取6%
如贵公司在以下方面有需要的,我公司将为贵公司提供最方便的服务:
1.贵公司在进项或抵扣方面有差额的;
2.客户压低价,利润薄的;
3.采购时需要正规发票报销的;
4.其它涉税方面需要的.
如果贵公司对我司的发票有质疑的可以验证后再付款!
hmm... didn't think about that, are there more opinions about this?
--
Sami Siren
Are you sure there is no trademark infringement here? Perhaps we
should call it something else, just to avoid any potential legal
unpleasantries ...
___
[ http://issues.apache.org/jira/browse/NUTCH-289?page=all ]
Stefan Groschupf updated NUTCH-289:
---
Attachment: ipInCrawlDatumDraftV1.patch
To keep the discussion alive attached a _first draft_ for storing the ip in the
crawlDatum for public discussion.
Chris Mattmann wrote:
Folks,
Before I (or someone else) reopens the issue, I think it's important to
understand the implications:
I vote for re-opening. See below.
1) Having a *side-effect* of the entire system stop processing after merely
logging a message at a certain event
Hi Andrzej,
The main problem, as Scott observed, is that the static flag affects all
instances of the task executing inside the same JVM. If there are
several Fetcher tasks (or any other tasks that check for SEVERE flag!),
belonging to different jobs, all of them will quit. This is
Chris Mattmann wrote:
+1
So, to summarize, the proposed resolution is:
* add flag field in Configuration instance to signify whether or not a
SEVERE error has been logged within a task's context
Yes, preferably define this as a public static final String-s in
NutchConfiguration, both
I have a proposal for a simple solution: set a flag in the current
Configuration instance, and check for this flag. The Configuration
instance provides a task-specific context persisting throughout the
lifetime of a task - but limited only to that task. Voila - problem
solved. We get
[ http://issues.apache.org/jira/browse/NUTCH-300?page=all ]
Andrzej Bialecki updated NUTCH-300:
Attachment: patch.txt
Clustering API improvements
---
Key: NUTCH-300
URL:
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Chris A. Mattmann reopened NUTCH-258:
-
Assign To: Chris A. Mattmann
Issue found to in fact be a real issue with the Fetcher: here's the proposed
solution:
* add flag field
尊敬的负责人您好!
本公司每月都有剩余的发票特优惠代开如:普通发票。商品销售。海关代征税。建筑安装。服务.
内河运输。广告。电子。五金 。机械 。等等如有打扰请多多包涵谢谢!
如有需要请电:13266768808
联系人:张先生
意乐实业有限公司
[ http://issues.apache.org/jira/browse/NUTCH-201?page=all ]
Sami Siren resolved NUTCH-201:
--
Resolution: Fixed
just committed this
add support for subcollections
--
Key: NUTCH-201
URL:
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ]
Jerome Charron resolved NUTCH-298:
--
Resolution: Fixed
Committed + some unit tests to reproduce.
Thanks Stefan.
As you mentioned it in a previous mail, I agree that the RobotRulesParser
25 matches
Mail list logo