If you could, please. I am, as you probably are, or have been in the
recent past, short on time for my project. I need something very simple.
An example that goes to a single URL, parses the pages under it, gathers
up all the words (terms) and returns me a Lucene index of them so that I
can then
. Thought I would post that
for other newbies.
ray
-Original Message-
From: Lukas, Ray [mailto:[EMAIL PROTECTED]
Sent: Monday, November 10, 2008 9:02 AM
To: nutch-user@lucene.apache.org
Subject: Example in Java Please
If you could, please. I am, as you probably are, or have been
@lucene.apache.org
Subject: Re: Example in Java Please
Ray,
I am feeling charitable this morning, so have posted code to do what
you desire at the end.
2008/11/10 Lukas, Ray [EMAIL PROTECTED]:
If you could, please. I am, as you probably are, or have been in the
recent past, short on time for my
/10 Lukas, Ray [EMAIL PROTECTED]
Thanks Hasan:
Forgive me.. First your generosity is greatly appreciated. Please
accept
my thanks.. I might be wrong, but... Humm.. I think that we are
missing
a few things here that I also need and, is, in fact, why I selected
Nutch.
Nutch does some things
. It is a good crawl example, with some comments, and
clear
enough (I think). It is the code used when using nutch from command
line. I
hope this help.
2008/11/10 Lukas, Ray [EMAIL PROTECTED]
Thanks Hasan:
Forgive me.. First your generosity is greatly appreciated. Please
accept
my thanks.. I might
Invalid indexes are generated {newbie question}
Please if you could help. I am trying to get Nutch to work from Java. I
wish to crawl a web page and generate Lucene indexes and then use the
NutchBean to query them. I located an example in the Nutch distribution
and have it working, or so I
dot com/description
/property
property
nameplugin.folders/name
value/plugins/value
description /
/property
property
namesearcher.dir/name
value/crawl.test/value
description /
/property
/configuration
-Ursprüngliche Nachricht-
Von: Lukas, Ray [mailto:ray.lu...@idearc.com
You are correct Hum.. In there I have what I believe are the default
settings..
# skip file: ftp: and mailto: urls
-^(file|ftp|mailto):
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r
-user@lucene.apache.org
Subject: Re: Does not locate my urls or filter problem.
Hello,
It might sound stupid but try to add few spaces and few new lines in
your myURLS.txt (it happend few times on different computers both linux
and windows)
Thanks,
Bartosz
Lukas, Ray pisze:
Thanks for your
us all down a rat hole ..
I will let you know what happens.. Thanks to all.. Bailing out of this burning
jet, trading in for a new one.. Learned a bunch, time to take that to a new
clean environment..
Thanks guys..
ray
-Original Message-
From: Lukas, Ray [mailto:ray.lu
Nutch query to work.. Can you help..
To begin specify full path to the nutch index.
2009/3/6 Lukas, Ray ray.lu...@idearc.com
I am not able to make any nutch query work. I know it is something
simple. Could someone take a look at what I am doing..
Here is the code I am using, it is pretty
that search directory using a get method off of
the config..
-Original Message-
From: Andrzej Bialecki [mailto:a...@getopt.org]
Sent: Friday, March 06, 2009 9:26 AM
To: nutch-user@lucene.apache.org
Subject: Re: Can not get Nutch query to work.. Can you help..
Lukas, Ray wrote:
Okay.. I did
to work.. Can you help..
Lukas, Ray wrote:
Thanks man for helping out on this.. Thanks.. Okay
Okay.. so Windows is okay.. I do not have much say in what we use
here..
so. Which is fine.. I am happy.
I have the following directories, directly under my
C:\EclipseWorkspaces\nutchTest\outputDir
Has anyone seen this.. Do you know the solution.. I will start looking
through the hadopp code but if someone has fixed this already I would
appreciate knowing.. Thanks guys..
Fri Mar 6 14:48:40 2009 DEBUG main java.io.IOException: config()
at
nutch-site.xml properly (full path to
your crawl dir)
Thanks,
Bartosz
Lukas, Ray pisze:
Has anyone seen this.. Do you know the solution.. I will start looking
through the hadopp code but if someone has fixed this already I would
appreciate knowing.. Thanks guys..
Fri Mar 6 14:48:40 2009
or anyway you wan't it. Crawl,
nutchBean also.
You should try nutch trunk or even rc
http://people.apache.org/~siren/nutch-1.0/rc1/nutch-1.0.tar.gz
It's to much difference to write here, it's just 10 times better than
0.9
Lukas, Ray pisze:
Oh rats.. Sorry.. Early morning here.. Forgot.. Yes, version
-Original Message-
From: Jim Van Sciver [mailto:jvansci...@gmail.com]
Sent: Monday, March 16, 2009 3:42 PM
To: nutch-user@lucene.apache.org
Subject: Nutch 1.0 Status?
I read in the developers email list that Nutch 1.0 has been packaged
for release to Apache. Congratulations!!
What
I have some basic questions about Nutch. Can someone point me in the
right direction, or if you have time, maybe just blast out an answer.
Question One:
I can see the terms that come from the web page. Can I set up a way to
also add these things to the index. In other words, if ice cream came
Erik is right!!
We should bane together and bring legal action against these dirtballs..
Do you like it when someone steals your work, takes credit for it, and
turns a profit off of it.
More than giving their lives to write this content, they are also
contributors to the very software that we use,
I am hoping to write up an article on my project and all the cool things
that I figured out about nutch and java and eclipse, etc.. I will go
into a long and boring dissertation at that point.. For now I will keep
it short and sweet... As best I can..
I have eclipse, java 6, nutch, hadoop running
Question:
What is the proper accepted and safe way to shut down nutch (hadoop)
after I am done with it?
Hadoop.getFileSystem().closeAll() ??
I did try this and no luck. Anyone else having this problem?
Thanks guys.. Thanks, if/when I find it I will post it for everyone.
Ray
,
files remain locked.
I have gone the brutal way and use unlocker.exe but I mean to find out
what's going wrong so I will keep posted on this one.
-Ray-
2009/4/23 Lukas, Ray ray.lu...@idearc.com
Question:
What is the proper accepted and safe way to shut down nutch (hadoop)
after I am done
to this for us.
ray
-Original Message-
From: Lukas, Ray [mailto:ray.lu...@idearc.com]
Sent: Thursday, April 23, 2009 9:21 AM
To: nutch-user@lucene.apache.org
Subject: RE: Hadoop thread seems to remain alive
Hey Ray.. Great name you have there.. HA..
I don't actually care about deleting
:35 AM
To: nutch-user@lucene.apache.org
Subject: Re: Hadoop thread seems to remain alive
Lukas, Ray wrote:
Hey Ray.. Great name you have there.. HA..
I don't actually care about deleting these files.. That is not the
issue.. See I have embedded Nutch in my application. That application
calls
Is this correct..
NativeCrawler nativeCrawler = null;
NutchBean nutchBean = null;
Query nutchQuery = null;
Hits nutchHits = null;
for (int index=0; index10; index++) {
nativeCrawler = new
));
}
}
this.segUpdater.start(); -- this is the line I am talking about..
Any ideas, has anyone run into this ?
-Original Message-
From: Lukas, Ray [mailto:ray.lu...@idearc.com]
Sent: Thursday, April 23, 2009 4:36 PM
To: nutch-user@lucene.apache.org
Subject: Using nutchBean
hunt around for that , or.. Maybe someone
already knows where that lives.. Maybe??
-Original Message-
From: Andrzej Bialecki [mailto:a...@getopt.org]
Sent: Thursday, April 23, 2009 5:32 PM
To: nutch-user@lucene.apache.org
Subject: Re: Using nutchBean
Lukas, Ray wrote:
I started going
Oh works great now.. Hey thanks guys and Andrzej Bialecki.. I will look
into how this can be submitted for everyone to have..
-Original Message-
From: Lukas, Ray [mailto:ray.lu...@idearc.com]
Sent: Thursday, April 23, 2009 5:45 PM
To: nutch-user@lucene.apache.org
Subject: RE: Using
there is no nutchBean.close() being called I will look for it when I
have more time for this.
-The other Ray-
2009/4/23 Lukas, Ray ray.lu...@idearc.com
I'm sorry guys.. I made a mistake.. This is not coming out of hadoop.. This
thread is coming out of nutch bean. Sorry.. I should have looked more
I
exit gracefull or not, probably due to some lost threads.
Since the servlet uses the same NutchBean looks like a similar issue as
yours.
Maybe there is no nutchBean.close() being called I will look for it when I
have more time for this.
-The other Ray-
2009/4/23 Lukas, Ray ray.lu...@idearc.com
: Hadoop thread seems to remain alive
Hey ray,
Actually found my problem, I wasn't stopping Tomcat at the right moment
the right way... so it kept some threads/locks.
If I do it using the Windows proper service... works fine.
-Ray-
2009/4/24 Lukas, Ray ray.lu...@idearc.com
What does that thread
Re-direct in Nutch 1.0 does not seem to work..
If I point to a url that is re-directed to (the result of a
re-direction, everything works great, if I point to the page that is
re-directing me to the working one, I get a corrupted index.
Can nutch handle re-direction and if so what magic is
/description
/property
I only want to scan within the domain I requested... Unless that url
instantly re-directs me to a different URL and then I want to only use
that one. Any thoughts..
Am I understanding this correctly?
Ray
-Original Message-
From: Lukas, Ray [mailto:ray.lu
Well three is a charm.. I need to move these to a different email as
well.. Please if you could.. Could we also remove this email address as
well..
Thanks
ray
-Original Message-
From: M S Ram [mailto:ms...@cse.iitk.ac.in]
Sent: Friday, December 04, 2009 10:01 AM
To:
34 matches
Mail list logo