[Nutch-dev] 企业信息

2007-06-26 Thread 代办税票
尊敬的负责人(经理/财务)您好! 本公司是信誉贸易有限公司,是经政府注册的正规公司。全国各大中 城市均有分公司。在全球化的发展国内市场经济竞争日趋激烈!使商家\ 公司利润微薄!贵公司作帐及销售方面是需要用到一些票据如:(电脑 版增值税、电脑版海关缴款书发票、普通商品发票、地税、建筑安装、其 它服务、广告、电脑版运输发票)等……。税率可根据所开金额的大小来 协商!并建议长期的合作。 在我公司成立多年一直坚持以信用;所开绝对是真票。在合作以代开 的方式提供到位的服务,如贵公司有些疑虑或担心,可上网查证或拿去税

[Nutch-dev] [jira] Commented: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap

2007-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508083 ] Hudson commented on NUTCH-497: -- Integrated in Nutch-Nightly #129 (See

[Nutch-dev] 想看就看

2007-06-26 Thread 爱人
- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now.

[Nutch-dev] [jira] Commented: (NUTCH-499) Refactor LinkDb and LinkDbMerger to reuse code

2007-06-26 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508162 ] Doğacan Güney commented on NUTCH-499: - Does anyone have any objections to the refactorings (Removal of

[Nutch-dev] [jira] Updated: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable

2007-06-26 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-434: Attachment: NUTCH-434_v2.patch Patch updated for trunk. I also changed Fetcher and Fetcher2 to use

[Nutch-dev] Re-crawling Problem

2007-06-26 Thread Luca Rondanini
Hi all, I'm having same trouble trying to carawl and recrawl my local filesystem. I'm using the script posted at http://wiki.apache.org/nutch/IntranetRecrawl My filesystem is made like this: ../ ../first/ ../first/file1.pdf ../first/second/ ../first/second/file2.pdf ../first/second/third

[Nutch-dev] [jira] Commented: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable

2007-06-26 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508222 ] Sami Siren commented on NUTCH-434: -- You missed one ObjectWritable in Indexer (the one that hit my head too hard

[Nutch-dev] You've received a postcard from a family member!

2007-06-26 Thread notme.hk
Good day. Your family member has sent you an ecard from notme.hk. Send free ecards from notme.hk with your choice of colors, words and music. Your ecard will be available with us for the next 30 days. If you wish to keep the ecard longer, you may save it on your computer or take a print. To

[Nutch-dev] [jira] Commented: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable

2007-06-26 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508239 ] Sami Siren commented on NUTCH-434: -- Now there is a good chance that you knew all this :). If your point was that

[Nutch-dev] [jira] Updated: (NUTCH-434) Replace usage of ObjectWritable with something based on GenericWritable

2007-06-26 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-434: Attachment: NUTCH-434_v3.patch New version. I added a simple LuceneDocumentWrapper as Sami Siren

Re: [Nutch-dev] [jira] Commented: (NUTCH-505) Outlink urls should be validated

2007-06-26 Thread Kai_testing Middleton
I can confirm that with NUTCH-505_draft_v2.patch I no longer get outlink urls that contain html mark-up as I was getting before on www.variety.com. --Kai Middleton - Original Message From: Doğacan Güney (JIRA) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, June 25, 2007 1:09:26

[Nutch-dev] NUTCH-119 :: how hard to fix

2007-06-26 Thread Kai_testing Middleton
I am evaluating nutch+lucene as a crawl and search solution. However, I am finding major bugs in nutch right off the bat. In particular, NUTCH-119: nutch is not crawling relative URLs. I have some discussion of it here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg08644.html Most of the