Re: [DISCUSS] Board resolution for Nutch as TLP

2010-04-10 Thread Jukka Zitting
of open-source software related to a large-scale web crawling platform for distribution at no charge to the public. Would it make sense to simplify the scope to ... open-source software related to large-scale web crawling for distribution at no charge to the public? BR, Jukka Zitting

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-18 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846865#action_12846865 ] Jukka Zitting commented on NUTCH-797: - I guess we need to apply the same logic also

[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2010-03-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846521#action_12846521 ] Jukka Zitting commented on NUTCH-797: - Wouldn't it be easier for Nutch to pass the base

Re: Update on Integration with Tika

2009-11-17 Thread Jukka Zitting
. This would make Tika more plugin friendly, but is not yet implemented. BR, Jukka Zitting

Re: Announce: New PMC member Dennis Kubes

2009-03-25 Thread Jukka Zitting
Hi, On Wed, Mar 25, 2009 at 11:24 AM, Andrzej Bialecki a...@getopt.org wrote: The Lucene Project Management Committee is happy to announce that Dennis Kubes has been voted in as a new PMC member. Hip, hip, hurray! Congratulations, Dennis! BR, Jukka Zitting

Re: Announce: New PMC member Dennis Kubes

2009-03-25 Thread Jukka Zitting
Hi, 2009/3/25 Doğacan Güney doga...@gmail.com: Btw, can Dennis be the 3rd +1 that we need so we can finally release 1.0 :D ? Yes. BR, Jukka Zitting

Re: [DISCUSS] contents of nutch release artifact

2009-03-21 Thread Jukka Zitting
that set of bits to build the release. BR, Jukka Zitting

Re: [DISCUSS] contents of nutch release artifact

2009-03-21 Thread Jukka Zitting
Hi, On Sat, Mar 21, 2009 at 12:28 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: To be accurate, the source release *is* the collection of bits that the release manager is using to produce binaries and other release artifacts. It's just a packaged svn export of the release tag

Re: [VOTE] Release Apache Nutch 1.0

2009-03-19 Thread Jukka Zitting
: how am I to verify that the release came from the sources in our svn when it contains stuff that doesn't exist in the svn? BR, Jukka Zitting

[jira] Created: (NUTCH-724) Drop the JAI libraries

2009-03-19 Thread Jukka Zitting (JIRA)
Drop the JAI libraries -- Key: NUTCH-724 URL: https://issues.apache.org/jira/browse/NUTCH-724 Project: Nutch Issue Type: Bug Reporter: Jukka Zitting Priority: Blocker Fix For: 1.0.0

Re: [VOTE] Release Apache Nutch 1.0

2009-03-19 Thread Jukka Zitting
Hi, On Thu, Mar 19, 2009 at 2:15 PM, Sami Siren ssi...@gmail.com wrote: Jukka Zitting wrote: -1 The release contains the Java Advanced Imaging libraries (jai_core.jar and jai_codec.jar) which are licensed under Sun's Binary Code License. We can't redistribute those libraries. ok, we need

[jira] Commented: (NUTCH-722) Nutch contains jars that we cannot redistribute

2009-03-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683473#action_12683473 ] Jukka Zitting commented on NUTCH-722: - See PDFBOX-381 for how the JAI dependency issues

[jira] Commented: (NUTCH-722) Nutch contains jars that we cannot redistribute

2009-03-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683474#action_12683474 ] Jukka Zitting commented on NUTCH-722: - One acceptable alternative for now is to drop

Re: [DISCUSS] contents of nutch release artifact

2009-03-19 Thread Jukka Zitting
Hi, On Thu, Mar 19, 2009 at 3:38 PM, Andrzej Bialecki a...@getopt.org wrote: (anyway, what's a measly 90MB nowadays .. ;) It's a pretty long download unless you have a fast connection and a nearby mirror. BR, Jukka Zitting

[jira] Commented: (NUTCH-725) NOTICE.txt is lacking info that should be there

2009-03-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683648#action_12683648 ] Jukka Zitting commented on NUTCH-725: - Looks good. NOTICE.txt is lacking info

[jira] Commented: (NUTCH-723) LICENCE.txt is lacking info that should be there

2009-03-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683649#action_12683649 ] Jukka Zitting commented on NUTCH-723: - Looks good to me. PS. There's not really a need

[jira] Commented: (NUTCH-621) Nutch needs to declare it's crypto usage

2008-09-28 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12635218#action_12635218 ] Jukka Zitting commented on NUTCH-621: - [...] get back the email from the govt

[PROPOSAL] Tika, a content analysis toolkit

2007-03-07 Thread Jukka Zitting
Google and USPTO searches there doesn't seem to be anything that would cause trouble with the Tika name. BR, Jukka Zitting Tika, a content analysis toolkit Abstract Tika is a toolkit for detecting and extracting metadata

Content-type detection for Tika

2006-09-06 Thread Jukka Zitting
to be adopting the standard so the database should be available at least on those platforms without manual installation. [1] http://freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting

Tika update

2006-08-16 Thread Jukka Zitting
is planning to add some of his stuff. The source tree at Google Code should be considered just a playground for bringing things together and discussing ideas, before migrating back to ASF infrastructure. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR

Thoughts on Parser design and dependencies

2006-08-16 Thread Jukka Zitting
or configuration files. Something like a TikaParser adapter class might be needed to achieve that. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development

Re: Tika update

2006-08-16 Thread Jukka Zitting
in general Nutch development (not saying that Nutch isn't good, just that I don't have the itch that Nutch is scratching). If there are enough people like me, then I think it makes sense to start another project, but otherwise I'd be happy to hang around here as well. BR, Jukka Zitting -- Yukatan - http

Re: Terminating slashes in URL normalization

2006-08-05 Thread Jukka Zitting
Hi, On 8/5/06, Jukka Zitting [EMAIL PROTECTED] wrote: Section 6.2.4 of RFC 3986 suggests that a crawler could do such a normalization if it detects that http://mail.python.org/mailman/listinfo redirects to http://mail.python.org/mailman/listinfo/. Which it of course doesn't... :-) Another

Re: Terminating slashes in URL normalization

2006-08-04 Thread Jukka Zitting
://mail.python.org/mailman/listinfo redirects to http://mail.python.org/mailman/listinfo/. I think just blindly adding the slash without knowing about the redirection is incorrect. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java

Re: Library for extracting text content from binaries

2006-07-25 Thread Jukka Zitting
, and maybe we could get going on a proposal. OK. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development

Library for extracting text content from binaries

2006-07-17 Thread Jukka Zitting
/jira/browse/JCR-415 BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development