of
open-source software related to a large-scale web crawling
platform for distribution at no charge to the public.
Would it make sense to simplify the scope to ... open-source software
related to large-scale web crawling for distribution at no charge to
the public?
BR,
Jukka Zitting
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846865#action_12846865
]
Jukka Zitting commented on NUTCH-797:
-
I guess we need to apply the same logic also
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846521#action_12846521
]
Jukka Zitting commented on NUTCH-797:
-
Wouldn't it be easier for Nutch to pass the base
. This would make Tika more plugin friendly,
but is not yet implemented.
BR,
Jukka Zitting
Hi,
On Wed, Mar 25, 2009 at 11:24 AM, Andrzej Bialecki a...@getopt.org wrote:
The Lucene Project Management Committee is happy to announce that Dennis
Kubes has been voted in as a new PMC member.
Hip, hip, hurray! Congratulations, Dennis!
BR,
Jukka Zitting
Hi,
2009/3/25 Doğacan Güney doga...@gmail.com:
Btw, can Dennis be the 3rd +1 that we need so we can finally release
1.0 :D ?
Yes.
BR,
Jukka Zitting
that set of bits to build
the release.
BR,
Jukka Zitting
Hi,
On Sat, Mar 21, 2009 at 12:28 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:
To be accurate, the source release *is* the collection of bits that
the release manager is using to produce binaries and other release
artifacts. It's just a packaged svn export of the release tag
: how am I to verify that the
release came from the sources in our svn when it contains stuff that
doesn't exist in the svn?
BR,
Jukka Zitting
Drop the JAI libraries
--
Key: NUTCH-724
URL: https://issues.apache.org/jira/browse/NUTCH-724
Project: Nutch
Issue Type: Bug
Reporter: Jukka Zitting
Priority: Blocker
Fix For: 1.0.0
Hi,
On Thu, Mar 19, 2009 at 2:15 PM, Sami Siren ssi...@gmail.com wrote:
Jukka Zitting wrote:
-1 The release contains the Java Advanced Imaging libraries
(jai_core.jar and jai_codec.jar) which are licensed under Sun's Binary
Code License. We can't redistribute those libraries.
ok, we need
[
https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683473#action_12683473
]
Jukka Zitting commented on NUTCH-722:
-
See PDFBOX-381 for how the JAI dependency issues
[
https://issues.apache.org/jira/browse/NUTCH-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683474#action_12683474
]
Jukka Zitting commented on NUTCH-722:
-
One acceptable alternative for now is to drop
Hi,
On Thu, Mar 19, 2009 at 3:38 PM, Andrzej Bialecki a...@getopt.org wrote:
(anyway, what's a measly 90MB nowadays .. ;)
It's a pretty long download unless you have a fast connection and a
nearby mirror.
BR,
Jukka Zitting
[
https://issues.apache.org/jira/browse/NUTCH-725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683648#action_12683648
]
Jukka Zitting commented on NUTCH-725:
-
Looks good.
NOTICE.txt is lacking info
[
https://issues.apache.org/jira/browse/NUTCH-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683649#action_12683649
]
Jukka Zitting commented on NUTCH-723:
-
Looks good to me.
PS. There's not really a need
[
https://issues.apache.org/jira/browse/NUTCH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12635218#action_12635218
]
Jukka Zitting commented on NUTCH-621:
-
[...] get back the email from the govt
Google and USPTO searches there doesn't seem to be
anything that would cause trouble with the Tika name.
BR,
Jukka Zitting
Tika, a content analysis toolkit
Abstract
Tika is a toolkit for detecting and extracting metadata
to be adopting the
standard so the database should be available at least on those
platforms without manual installation.
[1] http://freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting
is
planning to add some of his stuff. The source tree at Google Code
should be considered just a playground for bringing things together
and discussing ideas, before migrating back to ASF infrastructure.
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR
or configuration files. Something like a
TikaParser adapter class might be needed to achieve that.
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development
in
general Nutch development (not saying that Nutch isn't good, just that
I don't have the itch that Nutch is scratching). If there are enough
people like me, then I think it makes sense to start another project,
but otherwise I'd be happy to hang around here as well.
BR,
Jukka Zitting
--
Yukatan - http
Hi,
On 8/5/06, Jukka Zitting [EMAIL PROTECTED] wrote:
Section 6.2.4 of RFC 3986 suggests that a crawler could do such a
normalization if it detects that
http://mail.python.org/mailman/listinfo redirects to
http://mail.python.org/mailman/listinfo/.
Which it of course doesn't... :-) Another
://mail.python.org/mailman/listinfo redirects to
http://mail.python.org/mailman/listinfo/. I think just blindly adding
the slash without knowing about the redirection is incorrect.
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java
, and maybe we could get going on a proposal.
OK.
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development
/jira/browse/JCR-415
BR,
Jukka Zitting
--
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development
26 matches
Mail list logo