Ilia S. Yatsenko wrote:
Why hits.getTotal() ignore hitsPerSite?
hits.getTotal() always returns the total number of hits, regardless of
site. hitsPerSite is a filter on hits as they are displayed. This is
the way Google Yahoo handle this too. Search for NutchAnalysis
there. If you look
Fredrik Andersson wrote:
I just ported a lot of old 0.6 code to 0.7-dev/mapred. Lots of stuff
has changed I see! One thing I can't quite grasp though, is why the
Hit.getScore() has been removed in favour for the TopDocs-thingie
instead?
Hit.getScore() was generalized to Hit.getSortValue() in
Stefan Groschupf wrote:
http://wiki.apache.org/nutch/Presentations
Can you explan what this means: Page 20:
- cheduling is bottleneck, not disk, network or CPU?
I mean that neither the CPUs, disks or network are at 100% of capacity.
Disks are running around 50% busy, CPUs a bit higher, and
Jay Pound wrote:
Doug I also ran into this when I was testing ndfs the system would have to
wait for the namenode to tell the datanodes what data to recieve and which
data to replicate
When did you test this? Which version of Nutch? How many nodes? My
benchmark results from just a few days
Piotr Kosiorowski wrote:
Looking around in JIRA I found out I cannot resolve an issue. I am not
sure how it works but I suspect I lack some rights to do so. Am I right?
I have added you to the nutch-developers Jira group. Now you should be
able to resolve issues, etc.
Doug
Piotr Kosiorowski wrote:
So I have installed forrest and modified
src/site/src/documentation/content/xdocs.
Than run 'forrest'. And it generated content in src/site/build/site.
And now the questions:
Should I copy src/site/build/site to site and commit it?
Yes. I'm impressed that you got
Jay Pound wrote:
1.) we need to split up chunks of data into sub-folders as not to run the
filesystem out of its physical limitations of concurrent files in a single
directory, like the way squid splits up its data into directories.
I agree. I am currently using reiser with NDFS so this is
+1
Piotr Kosiorowski wrote:
Hello,
We should probably change user agent string in nutch-default.xml to
point to Apache site. The only question is http.agent.version - should
we set it to 0.07 for release and 0.08-dev for future work? I do not
know how it was used previously.
Current
[EMAIL PROTECTED] wrote:
- valuehttp://www.nutch.org/docs/en/bot.html/value
+ valuehttp://lucene.apache.org/nutch/bot.html/value
I think this should now be:
http://lucene.apache.org/nutch/bot.html
The docs/en pages have mostly been reduced to the about page, whose
translations I hate to
Stefan Groschupf wrote:
can someone please tell me what is the technical difference between
org.apache.nutch.io.Writable and java.io.Externalizable?
For me that looks very similar and Externalizable is available since
jdk 1.1.
What do I miss?
You don't miss much!
I avoided using Java's
Piotr Kosiorowski wrote:
I read your email ten times and still I am not sure
what the problem is.
The problem is with me.
Doug Cutting wrote:
[EMAIL PROTECTED] wrote:
- valuehttp://www.nutch.org/docs/en/bot.html/value
+ valuehttp://lucene.apache.org/nutch/bot.html/value
I clicked
Piotr Kosiorowski wrote:
Will do it tommorow - I wanted to put down a kind of release checklist
in Wiki - starting with where to change numbers. But would like to cover
also release howto - but in fact I am not sure how to do make a relase
yet. But will try to gather this information.
A
Piotr Kosiorowski wrote:
I think we all refer to 0.7 as next number (and 0.6 as current) so
nutch-default.xml contains wrong format. In fact it should still contain
-dev suffix.
To make undocumented comvention documented I would also like to suggest
naming releases with X.Y format and naming
Jay Pound wrote:
is the org.apache.nutch.crawl package a part of the nightly builds?
No. Nightly builds are from trunk. The mapred code is in a separate
branch in subversion. After the 0.7 release, when the mapred branch is
folded into trunk, then it will be in nightly builds. Until then
Fuad Efendi wrote:
Which parameter should I pass to Crawl? It should be directory
containing smth. in which format?
As before, inject takes a flat text files of urls, one per line. If you
wish to inject DMOZ urls, there is now a utility main() that will
convert the DMOZ file to such a file.
Piotr Kosiorowski wrote:
Is anyone working on preparing the release?
I am not.
If not I can spent some time on it in an hour or so.
+1
Thanks,
Doug
What API are you using to get hits, NutchBean or OpenSearchServlet? If
you're using OpenSearchServlet, then, with 1000 hits, most of your time
is probably spent constructing summaries. Do you need the summaries?
If not, use NutchBean instead, or modify OpenSearchServlet to not
generate
Piotr Kosiorowski wrote:
After making a tar I was trying to go through crawl tutorial.
- tar xvfz nutch-0.7.tar.gz
bin/nutch - is not executable (and nutch-daemon.sh too).
It is strange nobody reported it so far so it may still be my fault.
No, it looks like a problem with ant's tar task,
Jérôme Charron wrote:
svn.apache.org http://svn.apache.org down, or the problem is on my side?
A good way to answer this is to look at:
http://monitoring.apache.org/status/
It looks like SVN is currently up. And it works for me too.
Doug
Jeremy Bensley (sent by Nabble.com) wrote:
I have been experimenting with MapReduce to perform some distributed tasks
aside from the normal fetch/index routine of Nutch, and overall have had much
success.
I'm glad to hear this!
Today I have been experimenting with running extended duration
Apache Wiki wrote:
1. The SVN repository consists of the following areas:
a. '''trunk''' [ ... ]
a. '''Release-x.x''' branches [ ... ]
This should also mention tags, fixed versions of the code where no
development occurs.
I also would prefer that tag names and branch names are distinct,
I assume that in most NDFS-based configurations the production search
system will not run out of NDFS. Rather, indexes will be created
offline for a deployment (i.e., merging things to create an index per
search node), then copied out of NDFS to the local filesystem on a
production search
[EMAIL PROTECTED] wrote:
I, too, am looking forward to this, but I am wondering what that will
do to Kelvin Tan's recent contribution, especially since I saw that
both MapReduce and Kelvin's code change how FetchListEntry works. If
merging mapred to trunk means losing Kelvin's changes, then I
Kelvin Tan wrote:
Each of these stages will be handled in its own thread (except for HTML parsing
and scoring, which may actually benefit from having multiple threads). With the
introduction of non-blocking IO, I think threads should be used only where
parallel computation offers performance
I will postpone the merge of the mapred branch into trunk until I have a
chance to (a) add some MapReduce documentation; and (b) implement
MapReduce-based dedup.
Doug
Doug Cutting wrote:
Currently we have three versions of nutch: trunk, 0.7 and mapred. This
increases the chances
For now, look at the source for crawl/Crawl.java.
I'll try to add some documentation ASAP.
Doug
Steffen Viken Valvåg wrote:
Hi,
I'm playing around with the mapreduce branch, and got it working for a
simple intranet crawl by following the nutch tutorial on
NDFS is not recommended in 0.7. The version of NDFS in the mapred
branch is much improved. Note however that the mapred branch is
substantially different than 0.7 and is still incomplete.
Doug
Transbuerg Tian wrote:
hi, all friends,
I download nutch0.7 ,and want use ndfs independence.
Paul Baclace wrote:
Here is a patch for improving the error message that is displayed
when an intranet crawl commandline has a file instead of a directory
of files containing URLs.
I have committed this to the mapred branch.
Thanks, Paul!
Doug
Ordway, Ryan wrote:
As a quick workaround, I made a few quick adjustments to the
NDFSClient.java code to change the directory that temporary files
are created in. This is hard coded to /nutch/tmp, but if someone
could perhaps add a config option to make it configurable that would
be most
Stefan Groschupf wrote:
Beside that a behavior like the datanode that iterates until it find a
free port would be a better than just random.
That would be fine.
Would a patch have a chance to be applied? I can create one, but I
wouldn't love to waste time in case people do not want to
Paul Baclace wrote:
Doug Cutting expressed a concern to me about using util.Random to generate
random 64 bit block numbers for NDFS. The following is my analysis.
Nice stuff, Paul. Thanks.
It just occurred to me that perhaps we could simply use sequential block
numbering. All block ids
Fuad Efendi wrote:
I found this in J2SE API for setReuseAddress(default: false):
=
When a TCP connection is closed the connection may remain in a timeout
state for a period of time after the connection is closed (typically
known as the TIME_WAIT state or 2MSL wait state). For applications
Stefan Groschupf wrote:
I notice that can happen that a task is still running when the job
already was killed.
The web gui says there is no running job and process hold the nodes busy.
I haven't found the source of the problem yet.
I have seen this too. I think the solution is that, when
Piotr Kosiorowski wrote:
Should we have version independent site - always modified in trunk?
Or should we think about having a site (eg. JavaDocs, tutorial etc)
versioned and available for all versions at the same time?
The practice I've followed is to have the website reflect the latest
Stefan Groschupf wrote:
May we misunderstand each other, I do not mean tasks that crash, I mean
tasks that are 20 times slower on one machine as the other tasks on the
other machines.
Ah, I call that speculative re-exectution. Nutch does not yet
implement that.
I don't think speculative
Andrzej Bialecki wrote:
100k regexps is still alot, so I'm not totally sure it would be much
faster, but perhaps worth checking.
I have worked with this type of technology before (minimized,
determinized FSAs, constructed from large sets of strings expressions)
and it should be very fast to
Erik Hatcher wrote:
Please - someone reply back volunteering to correct this ASAP.
My bad. I'm fixing this right now. In 24 hours all Nutch downloads
should be through the mirrors.
Sorry!
Doug
Okay. All nutch downloads should now be through mirrors.
The web site now refers to downloads through the url:
http://www.apache.org/dyn/closer.cgi/lucene/nutch/
The former download urls now redirect to the appropriate places:
http://lucene.apache.org/lucene/nutch/release/
Chris Mattmann wrote:
So, one thing it seems is that fields to be indexed, and used in a field
query must be fully lowercase to work? Additionally, it seems that they
can't have symbols in them, such as _, is that correct? Would you guys
consider this to be a bug?
Yes, this sounds like a bug.
Chris Mattmann wrote:
So, my question to you then
is, what type of QueryFilter should I develop in order to get my query for
contactemail:email address to work as a standalone query? For instance,
right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be
the right way to do it
Massimo Miccoli wrote:
Any news about integration of OPIC in mapred? I have time to develop
OPIC on Nutch Mapred. Can you help me to start?
By the email from Carlos Alberto-Alejandro CASTILLO-Ocaranza, seams that
the best way to integrate OPIC in on old webdb, is this way valid also
CrawlDb
Here is a patch that implements this. I'm still testing it. If it
appears to work well, I will commit it.
Doug Cutting wrote:
Massimo Miccoli wrote:
Any news about integration of OPIC in mapred? I have time to develop
OPIC on Nutch Mapred. Can you help me to start?
By the email from
The attached patch adds support for rel=nofollow. Links which specify
this are ignored. Any objections to committing this?
http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html
Doug
Index: src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
Stefan Groschupf wrote:
I copy a working index and merge the original and the old together.
Than I run the dedub over these index. Shouldn't the dedub tool remove
the duplicates in the merged index?
I usually dedup before index merge, so that the merged index contains no
duplicates. The
Paul Baclace wrote:
I hope someone can fold these into the wiki page since it appears as
Immutable Page to me.
You just need to create yourself an account by visiting:
http://wiki.apache.org/nutch/UserPreferences
Doug
Rod Taylor wrote:
Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).
This sounds strange. Are the datanode errors always on the same host?
How many hosts are you running this on?
Doug
Ken van Mulder wrote:
First is that the fetcher slows down over time and continues to use more
and more memory as it goes (which I think is eventually hanging the
process).
What parser plugins do you have enabled? These are usually the culprit.
Try using 'kill -QUIT' to see what various
Rod Taylor wrote:
There is only a single datanode and there are 20 hosts.
That's a lot of load on one datanode. I typically run a datanode on
every host, accessing the local drives on that host.
Doug
Rod Taylor wrote:
I tried running one datanode per machine connecting back to the same SAN
but it seemed pretty clunky. A crash of any datanode would take down
the entire system (no data replication since it's a common data-store in
the end). Reducing it to a single datanode did not have this
Rod Taylor wrote:
Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.
This is using mapred.local.dir on the local machine (not sharedd
Rod Taylor wrote:
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the No input
directories issue when using a local filesystem with multiple task
I was recently benchmarking fetching at a site with lots of bandwidth,
and it seemed to me that protocol-http is capable of faster crawling
than protocol-httpclient. So I don't think we should discard
protocol-http just yet. But there's a lot of duplicate code between
these, which is
Jérôme Charron wrote:
In fact, I think it could be a good idea to move the nutch language
identifier core code
to a standalone library or to lucene code.
Does it make sense? What do you think about it? What is the best solution
(standalone vs lucene)?
One could put it in the lucene contrib
Jack Tang wrote:
Below is google architecture in my brain:
DataNode A
Master DataNode B GoogleCrawler
DataNode C
..
GoogleCrawler is kept running all the time. One day, it gets fethlist
from DataNode A, crawls all pages and
Massimo Miccoli wrote:
Ther's a problem with that solution. The protocol-httpclient now , for
some site, gerate a SEVERE Narrowly avoided an infinite loop in execute
So the fetcher exit ands only some pages is fetched until the SEVERE
message.
I don't know a solution, for now I switch back
Rod Taylor wrote:
It seems maxPerHost could cause us not to fill each segment to topN even
when there are more than enough URLs for this job.
We should only count URLs we keep instead of all URLs considered.
There were also two variables named count which is probably bad form
(not a Java
Andrzej Bialecki wrote:
I would be disappointed by this move - language identifier is an
important component in Nutch. Now the mere fact that it's bundled with
Nutch encourages its proper maintenance. If there is enough drive in
terms of willingness and long-term commitment it would make sense
Great stuff, Paul!
A few minor corrections.
Apache Wiki wrote:
1. The env var NUTCH_MASTER is set to the hostname of the master machine.
This is optional. The alternative is to mount a common home directory
with NFS, as many clusters do, and keep the Nutch software there.
Also,
Johannes Zillmann wrote:
please correct me if i'm wrong, but if i understood
all right there are 2 choices...
(1) message based communication
(2) stream based communication
In case of (2) you won't come along without one thread
per connection.
In general, you are correct. But Nutch's IPC is
[EMAIL PROTECTED] wrote:
Yes, problem in negative progress percentages.
Is /usr/root/seeds/urls the same file on all hosts? How big is it?
Doug
Game Now wrote:
Hi All,
I wanna Nutch help me do a range search, such as price:{1000 TO 2000}
or date[20050101 TO 2005].
But org.apache.nutch.searcher.Query#parse() method parse them to
price 1000 2000 and date 20050101 2005 when i pass them to the
method.
Anybody can help me complete
Andrzej Bialecki wrote:
I have a problem with the recently added CRC files, when put-ting
stuff to NDFS. NDFS complains that these files already exist - I suspect
that it creates them on the fly just before they are actually
transmitted from the NDFSClient - and aborts the transfer. I was able
[EMAIL PROTECTED] wrote:
Why we need parameter mapred.map.tasks greater than number of available
host? If we set it equal to number of host, we got negative progress
percentages problem.
Can you please post a simple example that demonstrates the negative
progress problem? E.g., the minimal
This sounds like a bug in the URLFilter implementation. Is this
RegexURLFilter? Can you figure out what regex is causing this?
Probably the patch should be there, no?
Doug
Rod Taylor wrote:
I stuck a few log statements within ParseOutputFormat.java. One after
'String toUrl =' and another
Andrzej Bialecki wrote:
Further input into this: after replacing the ConjunctionScorer with the
fixed version from JIRA, now the bottleneck seems to be ... in
Summarizer, of all things. :-)
While making the summarizer faster would of course be good, keep in mind
that the cost of summarizing
It looks as though Nutch is inadvertantly submitting forms.
At DOMContentUtils.java:58 we specify that the action parameter of an
HTML form should be extracted as a link. Yet we ignore the method
parameter of the form. I think we should only follow these when the
method is get, not when it
[EMAIL PROTECTED] wrote:
Implement a reader for CrawlDB, loosely inspired by NUTCH-114 (thanks Stefan!).
The reader offers similar functionality to the classic readdb command.
This looks great! Thanks, Andrzej.
I just ran it on a 50M page crawl. It took longer than I expected. The
reduce
Doug Cutting wrote:
I just ran it on a 50M page crawl.
FYI, here's the output:
051123 191703 TOTAL urls: 167780785
051123 191703 avg score:1.152
051123 191703 max score:47357.137
051123 191703 min score:1.0
051123 191703 retry 0: 167780785
051123 191703 status 1
Ken Krugler wrote:
For what it's worth, below is the filter list we're using for doing an
html-centric crawl (no word docs, for example). Using the (?i) means we
don't need to have upper lower-case versions of the suffixes.
Matt Kangas wrote:
#2 should be a pluggable/hookable parameter. high-scoring sounds like
a reasonable default basis for choosing recrawl intervals, but I'm sure
that nearly everyone will think of a way to improve upon that for their
particular system.
e.g. high-scoring ain't gonna cut it
Jérôme Charron wrote:
For consistency purpose, and easy of nutch management, why not filtering the
extensions based on the activated plugins?
By looking at the mime-types defined in the parse-plugins.xml file and the
activated plugins, we know which content-types will be parsed.
So, by getting
Chris Mattmann wrote:
In principle, the mimeType system should give us some guidance on
determining the appropriate mimeType for the content, regardless of whether
it ends in .foo, .bar or the like.
Right, but the URL filters run long before we know the mime type, in
order to try to keep us
Matt Kangas wrote:
The latter is not strictly true. Nutch could issue an HTTP HEAD before
the HTTP GET, and determine the mime-type before actually grabbing the
content.
It's not how Nutch works now, but this might be more useful than a
super-detailed set of regexes...
This could be a
Andrzej Bialecki wrote:
Doug Cutting wrote:
Modify CrawlDatum to store the MD5Hash of the content of fetched urls.
Yes, this is required to detect unmodified content. A small note: plain
MD5Hash(byte[] content) is quite ineffective for many pages, e.g. pages
with a counter, or with ads
This should work. TestRPC.java has a case which returns void (ping).
Can you send a simple test case that fails?
Doug
Stefan Groschupf wrote:
Hi,
I never used the RCP that intensive so I was surprised to found this
limitation.
Is it known that the RCP.call method can only call methods that
Doug Cutting wrote:
Implementing something like this for Lucene would not be too difficult.
The index would need to be re-sorted by document boost: documents would
be re-numbered so that highly-boosted documents had low document
numbers.
In particular, one could:
1. Create an array of int
Andrzej Bialecki wrote:
By all means please start, this is still near the limits of my knowledge
of Lucene... ;-)
Okay, I'll try to get something working fairly soon.
Doug
Andrzej Bialecki wrote:
By all means please start, this is still near the limits of my knowledge
of Lucene... ;-)
Attached is a class which sorts a Nutch index by boost. I have only
tested it on a ~100 page index, where it appears to work correctly.
Please tell me how it works for you.
Andrzej Bialecki wrote:
Shouldn't this be combined with a HitCollector that collects only the
first-n matches? Otherwise we still need to scan the whole posting list...
Yes. I was just posting the work-in-progress.
We will also need to estimate the total number of matches by
extrapolating
FYI
This has been fixed in the mapred branch, but that patch is not in
0.7.1. This alone might be a reason to make a 0.7.2 release.
Doug
Original Message
Subject: Crawler submits forms?
Date: Tue, 13 Dec 2005 16:57:34 -
From: Andy Read [EMAIL PROTECTED]
Reply-To:
Andrzej Bialecki wrote:
I'll test it soon - one comment, though. Currently you use a subclass of
RuntimeException to stop the collecting. I think we should come up with
a better mechanism - throwing exceptions is too costly.
I thought about this, but I could not see a simple way to achieve
Stefan Groschupf wrote:
- job.setPartitionerClass(PartitionUrlByHost.class); in the generate
method
yes, this line is the one you need to change. The other stuff can be as
it is for now.
I don't recommend this change. It makes your crawler impolite, since
multiple tasks may reference
Andrzej Bialecki wrote:
. How were the queries generated? From a log or randomly?
Queries have been picked up manually, to test the worst performing cases
from a real query log.
So, for example, the 50% error rate might not be typical, but could be
worst-case.
. When results differed
Stefan Groschupf wrote:
In case you setup one thread per host, you have maximal as much
connections to one host as you have boxes. In may case that are not
that much.
Anything more than one is not generally considered polite.
Also it is a reproducible bug that the segment is everytime
Mike Cannon-Brookes wrote:
Hey guys,
Hi, Mike! Welcome.
- Classloading - I have had many problems with NutchConf due to the
way it loads it's resources. In a J2EE scenario, it's simply evil :)
Would there be any great problem with switching it's classloader to
Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the
code from mapred, I guess some time around the middle of January (Doug?)
Thinking about this more, perhaps we should do it sooner. There's
already a branch for 0.7.x releases, so what point is there
Andrzej Bialecki wrote:
I agree. I just thought that we would prepare the relase based on the
code in trunk/ , and in that case we would like to wait with the merge
before we do the release.
My definition of trunk is that it should be where the majority of
development happens. It is what we
Andrzej Bialecki wrote:
Yes, we just need to make sure that all important bits from trunk are on
the 0.7 branch, before we start.
I will sync mapred with the trunk prior to the merge, so we should still
be able to get anything we need after mapred is merged back to trunk.
BTW, we're pretty
Mike Cannon-Brookes wrote:
0.7 vs 0.8 - apologies if I'm using an old version. I'm using the
latest binary release. I'll switch to latest SVN HEAD and see how that
works in my application.
The mapred branch will soon be moved to trunk, so you might be better
off starting there, since a lot
Doug Cutting wrote:
Once the mapred branch is folded in then there's a bunch of
stuff that's obsoleted that needs to be removed. I'd like to get
dynamic configuration in, if possible.
For reference, I found the message I posted about this a while back:
http://www.mail-archive.com/nutch-dev
Sami Siren wrote:
+1. I think this is good time to merge now as the mapred is fully usable.
Barring objections, I will do this tomorrow morning, Pacific time.
Doug
[EMAIL PROTECTED] wrote:
+/*
+ * (non-Javadoc)
+ *
+ * @see org.apache.nutch.io.Writable#write(java.io.DataOutput)
+ */
+public final void write(DataOutput out) throws IOException {
We should either include javadoc or not. In general, all public methods
should have
I am leaving tomorrow for a one week vacation and will turn off my home
workstation, so there will be no nightly builds.
Long-term, I've submitted an infrastructure request to get a Solaris
zone created for Nutch where we can run nightly builds. That will
eventually remove the dependency on
Andrzej Bialecki wrote:
Gal Nitzan wrote:
this function throws IOException. Why?
public long getPos() throws IOException {
return (doc*INDEX_LENGTH)/maxDoc;
}
It should be throwing ArithmeticException
The IOException is required by the API of RecordReader.
Stefan Groschupf wrote:
I also note this line in client.java
public Writable[] call(Writable[] params, InetSocketAddress[] addresses)
throws IOException {
if (params.length == 0) return new Writable[0];
Do I understand it correct that in case the remote method does not need
any
Andrzej Bialecki wrote:
I'm happy to report that further tests performed on a larger index seem
to show that the overall impact of the IndexSorter is definitely
positive: performance improvements are significant, and the overall
quality of results seems at least comparable, if not actually
Byron Miller wrote:
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
Both. The highest-scoring pages are kept in separate indexes that are
searched
Andrzej Bialecki wrote:
Example: what happens now if you try to run more than one fetcher at the
same time, where the fetcher parameters differ (or a set of activated
plugins differs)? You can't - the local tasks on each tasktracker will
use whatever local config is there.
That's true when
Stefan Groschupf wrote:
Before we start adding meta data and more meta data, why not once in
general adding meta data to the crawlDatum, than we can have any kinds
of plugins that add and process metadata that belongs to a url.
+1
This feature strikes me as something that might prove very
[EMAIL PROTECTED] wrote:
I'm late, but better late than never: +1 (I thought Stefan was already a
committer, actually).
+1
Not as late as I am! I'm still catching up on December email...
The Lucene PMC has final say, and not all members of the PMC are on
nutch-dev, so I'll forward the
1 - 100 of 329 matches
Mail list logo