i get nightly to run, but it never completes anything.
always get stuck at 98% here and there.. i'll try
todays build and see what happens.
--- Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi,
looks like the latest nightly build is broken.
Looks like the jar that comes with the nightly build
I would like to see something as active, in process
and inbound. Active data is live and on the query
servers (both indexes and correlating segments) in
process are tasks currently being mapped out and
inbound is processes/data that is pending to be
processed.
Active nodes report as in the
I would love to see it continue as a plugin. I'm
moving to mapreduce myself so i would be interested in
utilizing it there.
thanks for the great work! look forward to trying out
your updates.
feel free to contact me directly if you wish.
-byron
--- Dawid Weiss [EMAIL PROTECTED] wrote:
Hi
Has indexsorter code discussed a while back been
pushed to jira or put in SVN? I'd like to give it a
whirl on some of my indexes and the archive i can find
cut the post with the code attached..
[
http://issues.apache.org/jira/browse/NUTCH-16?page=comments#action_12364354 ]
byron miller commented on NUTCH-16:
---
Cool
an inverse of this plugin would be great, or enhancement of this for +/- values
based on patters as i think lowering score
[
http://issues.apache.org/jira/browse/NUTCH-79?page=comments#action_12364357 ]
byron miller commented on NUTCH-79:
---
Piotr,
Any update on this? Have you been able to run with this or still working out
the kinks?
Fault tolerant searching
[
http://issues.apache.org/jira/browse/NUTCH-14?page=comments#action_12364358 ]
byron miller commented on NUTCH-14:
---
Are you still hitting this Stefan?
NullPointerException NutchBean.getSummary
-
Key
I'll be happy to do it.
--- Doug Cutting [EMAIL PROTECTED] wrote:
Would someone volunteer to develop Nutch-based
site-search engine for
all apache.org domains? We now have a Solaris zone
to host this.
Thanks,
Doug
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12363400 ]
byron miller commented on NUTCH-134:
Thanks Erik, I was able to pull down the highlighter and i'll be loading it up
on mozdex.com to test out over the weekend (1/21/2006
[
http://issues.apache.org/jira/browse/NUTCH-183?page=comments#action_12363477 ]
byron miller commented on NUTCH-183:
As Mr Burns would say eggcelent I'll give this a try. BTW, is it possible
to implement functionality that would start jobs
because one
document triggers such an
exception.
best regards,
Dominik
Byron Miller wrote:
60111 103432 reduce reduce
060111 103432 Optimizing index.
060111 103433 closing reduce
060111 103434 closing reduce
060111 103435
I was thinking that Nutch needs some sort of workflow
manager. This way you could build jobs off specific
workflows and hopefully recover jobs based upon the
portion of the workflow they are stuck. (or restart a
job if failed/processing time x hours and other such
workflow processes rules)
60111 103432 reduce reduce
060111 103432 Optimizing index.
060111 103433 closing reduce
060111 103434 closing reduce
060111 103435 closing reduce
java.lang.NullPointerException: value cannot be null
at
org.apache.lucene.document.Field.init(Field.java:469)
at
Excellent Ideas and that is what i'm hoping to use
some of the social bookmarking type ideas to build the
starter sites from and linkmaps from.
I hope to work with Simpy or other bookmarking
projects to build somewhat of a popularity map(human
edited authorit) to merge and calculate against a
Fixed in the copy i run as i've been able to get my
100k pages indexed without getting that error.
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Lukas Vlcek wrote:
Hi,
I am trying to use the latest nutch-trunk version
but I am facing
unexpected Job failed! exception. It seems
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
With this patch and a top result set in the xml file
does that mean it will stop scanning the index at
I figured since i'm in research mode i woul start
compiling available information resource and
putthing them up on the wiki
http://wiki.apache.org/nutch/Search_Theory
sorry about all the cvs message on edits.. i'm not
used to the touchpad on this darned laptop :)
Anyhow, if you have any
Reporter: byron miller
I ran a crawl of 100k web pages and got:
org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
at
org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
at org.apache.nutch.fs.FileUtil.copyContents
[
http://issues.apache.org/jira/browse/NUTCH-123?page=comments#action_12361473 ]
byron miller commented on NUTCH-123:
Perhaps you should try the cache servlet as it dumps out the data as it sees it.
Cache.jsp some times generate NullPointerException
[
http://issues.apache.org/jira/browse/NUTCH-42?page=comments#action_12361474 ]
byron miller commented on NUTCH-42:
---
Safe to close. (done) We have XML/OpenSearch in latest trunk and other
branches.
enhance search.jsp such that it can also returns XML
Versions: 0.8-dev
Reporter: byron miller
Priority: Minor
Add support to the fetcher to look for sitemap files, download them and process
them into webdb.
Perhaps create a robots.txt directive that can be used to create a standard
format for sitemaps in RSS, XML or text format (one line
[
http://issues.apache.org/jira/browse/NUTCH-155?page=comments#action_12361398 ]
byron miller commented on NUTCH-155:
I don't know how i feel about removing the JSP stuff into a contrib and then
fluffing it up more with the potential to support other
I'll pull a build down tonight and let you know how it
goes!
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Hi,
I just commited a large patch to cleanup the trunk/
of obsolete and
broken classes remaining from the 0.7.x development
line. Please test
that things still work as
[
http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12361348 ]
byron miller commented on NUTCH-92:
---
Has there been any advancement on this front?
DistributedSearch incorrectly scores results
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12361350 ]
byron miller commented on NUTCH-134:
Where is the lucene summarizer from the contrib? i'm not seeing anything
obvious (unless it's under a different name)
Summarizer
[
http://issues.apache.org/jira/browse/NUTCH-95?page=comments#action_12361300 ]
byron miller commented on NUTCH-95:
---
Number 2 sounds great, but wouldn't you always want the latest scoring document
since that should reflect the latest updatedb and rank
[
http://issues.apache.org/jira/browse/NUTCH-55?page=comments#action_12361301 ]
byron miller commented on NUTCH-55:
---
You can close this ticket, duplicate of ticket NUTCH-59
Create dmoz.org search plugin - incorporate the dmoz.org
title/category
Not sure if its because i have some of the older 7.x
parameters for my plugins - did these change in trunk?
051223 194716
crawl-20051223193201/crawldb/current/part-0/data:0+809491
051223 194716 map 100%
051223 194717
crawl-20051223193201/linkdb/current/part-0/data:0+1270873
-adding
I've got 400mill db i can run this against over the
next few days.
-byron
--- Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi Andrzej,
wow are really great news!
Using the optimized index, I reported previously
that some of the
top-scoring results were missing. As it happens,
the
+1
Thanks for all the hard work! Very much appreciated
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Hi,
During the past year and more Stefan participated
actively in the
development, and contributed many high-quality
patches. He's been
spending considerable effort on addressing many
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12359649 ]
byron miller commented on NUTCH-134:
I would take more cpu for better summaries any day :) cpu power is cheaper than
manual intervention!
If any testing is needed, don't
Is there any way to make sure all plugins/modules
reference a standard version of log4j? seems to me
there are atlest 3 different versions (although minor)
# find . | grep log4
./plugins/parse-pdf/log4j-1.2.9.jar
./plugins/parse-pdf/PDFBox-0.7.2-log4j.jar
./plugins/parse-rss/log4j-1.2.6.jar
I wish it did have something to do with halloween :)
Google tells no lies! :P
--- Nick Lothian [EMAIL PROTECTED] wrote:
If you just do the search you'll see a link at the
side of the page:
Why these results?
These results may seem politically
slanted. Here's what happened.
Actually, to add fuel to the fire, using nutch out of
the box, searching for miserable failure yields the
same thing.
http://www.mozdex.com/search.jsp?query=miserablefailure
--- Fuad Efendi [EMAIL PROTECTED] wrote:
Thanks Nick,
So this is why some search engines are not honest. I
mean the
is still much smaller than
Googles, it is amazing how closely the results can
match!
Makes you wonder just how much of the net is usefull
;)
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Byron Miller wrote:
Actually, to add fuel to the fire, using nutch out
of
the box, searching
I'll give tagsoup a try, i saw that was in there.
thanks for the headsup!
-byron
--- Andrzej Bialecki [EMAIL PROTECTED] wrote:
Byron Miller wrote:
http://people.apache.org/~andyc/neko/doc/html/changes.html
Any chance of getting that rolled in? Has a few
fixes
that look good
[
http://issues.apache.org/jira/browse/NUTCH-39?page=comments#action_12356374 ]
byron miller commented on NUTCH-39:
---
I'm using the above code snippet on mozdex and run across some strange issues..
for example if you search for cnn.com it doesn't show up
[
http://issues.apache.org/jira/browse/NUTCH-49?page=comments#action_12355864 ]
byron miller commented on NUTCH-49:
---
Can something like this be adapted to use the regex filter as well? it would be
nice to say new only and match urls of x type or x link
38 matches
Mail list logo