Re: Nutch 2.0 Documentation

2011-08-09 Thread Markus Jelsma
Hi, Maybe a stupid question but i don't see a trunk/docs? Cheers On Thursday 04 August 2011 12:47:54 lewis john mcgibbney wrote: Hi, Was mucking around on a totally separate personal issue with Gora today and couldn't help but like the /docs directory which is bundled when you svn co the

[jira] [Updated] (NUTCH-1028) Log parser keys

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1028: - Attachment: NUTCH-1028-1.4-1.patch Patch for 1.4 Log parser keys ---

[jira] [Updated] (NUTCH-1028) Log parser keys

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1028: - Patch Info: [Patch Available] Log parser keys --- Key:

[jira] [Commented] (NUTCH-1028) Log parser keys

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081585#comment-13081585 ] Julien Nioche commented on NUTCH-1028: -- You can see the progression of the parsing on

[jira] [Assigned] (NUTCH-881) Good quality documentation for Nutch

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-881: -- Assignee: Lewis John McGibbney Good quality documentation for Nutch

[jira] [Commented] (NUTCH-881) Good quality documentation for Nutch

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081639#comment-13081639 ] Lewis John McGibbney commented on NUTCH-881: In Nutch trunk we currently only

[jira] [Assigned] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-623: -- Assignee: Lewis John McGibbney Change plugin source directory

[jira] [Commented] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081642#comment-13081642 ] Lewis John McGibbney commented on NUTCH-623: On second thoughts, and taking

[jira] [Issue Comment Edited] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081663#comment-13081663 ] Julien Nioche edited comment on NUTCH-623 at 8/9/11 2:34 PM: -

[jira] [Commented] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081663#comment-13081663 ] Julien Nioche commented on NUTCH-623: - The functionality being delegated to Tika does

[jira] [Commented] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081677#comment-13081677 ] Lewis John McGibbney commented on NUTCH-623: If we wished to fix this, then it

Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Julien Nioche
Hi Kirby, Grumble, Grumble. (adding dev@nutch, as that is more than likely where this discussion really belongs)... am adding gora-...@incubator.apache.org as well It'd be really nice if folks could just follow the commands in the nightly build, and get a build pushed out. I've pointed

[jira] [Commented] (NUTCH-463) Nutch powerpoint parser plugin fails to parse ppt with images

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081695#comment-13081695 ] Lewis John McGibbney commented on NUTCH-463: Can we close this issue? .ppt

[jira] [Closed] (NUTCH-463) Nutch powerpoint parser plugin fails to parse ppt with images

2011-08-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-463. --- Resolution: Won't Fix Parsing delegated to Tika Nutch powerpoint parser plugin fails to parse ppt

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Kirby Bohling
Julien, On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Hi Kirby, Grumble, Grumble. (adding dev@nutch, as that is more than likely where this discussion really belongs)... am adding gora-...@incubator.apache.org as well It'd be really nice if

[jira] [Commented] (NUTCH-978) [GSoC 2011] A Plugin for extracting certain element of a web page on html page parsing.

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081703#comment-13081703 ] Lewis John McGibbney commented on NUTCH-978: If there has been a plugin written

[jira] [Commented] (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081708#comment-13081708 ] Lewis John McGibbney commented on NUTCH-342: OK well I think that sets a

[jira] [Commented] (NUTCH-296) Image Search

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081714#comment-13081714 ] Lewis John McGibbney commented on NUTCH-296: The parsing and extraction of

[jira] [Commented] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081731#comment-13081731 ] Lewis John McGibbney commented on NUTCH-849: I checked out the latest trunk 2.0

[Nutch Wiki] Trivial Update of GORA_HBase by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The GORA_HBase page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/GORA_HBase?action=diffrev1=6rev2=7 /gora-orm }}} * Compile Nutch - ant runtime - * Make

[jira] [Commented] (NUTCH-296) Image Search

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081750#comment-13081750 ] Markus Jelsma commented on NUTCH-296: - Would be a nice feature but no patches. +1

[jira] [Commented] (NUTCH-849) different versions of the same library in nutch-2.0-dev.job and local\lib directory

2011-08-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081755#comment-13081755 ] Markus Jelsma commented on NUTCH-849: - I see it in my 1.4-build too with several deps.

[jira] [Commented] (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081757#comment-13081757 ] Lewis John McGibbney commented on NUTCH-666: Thank you Dennis for confirming.

RE: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-08-09 Thread Tom Davidson
Hi All, I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole generate/update/fetch

[Nutch Wiki] Trivial Update of SetupProxyForNutch by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The SetupProxyForNutch page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diffrev1=12rev2=13 google.com apache.org }}} + for those

[jira] [Created] (NUTCH-1077) Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException when running update

2011-08-09 Thread Tom Davidson (JIRA)
Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException when running update --- Key: NUTCH-1077 URL: https://issues.apache.org/jira/browse/NUTCH-1077 Project: Nutch

[jira] [Closed] (NUTCH-296) Image Search

2011-08-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-296. -- Resolution: Won't Fix Assignee: Lewis John McGibbney As there has been no

[Nutch Wiki] Trivial Update of SetupProxyForNutch by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The SetupProxyForNutch page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diffrev1=14rev2=15 Tinyproxy supports filtering of web sites

[Nutch Wiki] Trivial Update of SetupProxyForNutch by LewisJohnMcgibbney

2011-08-09 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The SetupProxyForNutch page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/SetupProxyForNutch?action=diffrev1=15rev2=16 Tinyproxy supports filtering of web