Example in Java Please

2008-11-10 Thread Lukas, Ray
If you could, please. I am, as you probably are, or have been in the recent past, short on time for my project. I need something very simple. An example that goes to a single URL, parses the pages under it, gathers up all the words (terms) and returns me a Lucene index of them so that I can then

RE: Example in Java Please

2008-11-10 Thread Lukas, Ray
. Thought I would post that for other newbies. ray -Original Message- From: Lukas, Ray [mailto:[EMAIL PROTECTED] Sent: Monday, November 10, 2008 9:02 AM To: nutch-user@lucene.apache.org Subject: Example in Java Please If you could, please. I am, as you probably are, or have been

RE: Example in Java Please

2008-11-10 Thread Lukas, Ray
@lucene.apache.org Subject: Re: Example in Java Please Ray, I am feeling charitable this morning, so have posted code to do what you desire at the end. 2008/11/10 Lukas, Ray [EMAIL PROTECTED]: If you could, please. I am, as you probably are, or have been in the recent past, short on time for my

RE: Example in Java Please

2008-11-11 Thread Lukas, Ray
/10 Lukas, Ray [EMAIL PROTECTED] Thanks Hasan: Forgive me.. First your generosity is greatly appreciated. Please accept my thanks.. I might be wrong, but... Humm.. I think that we are missing a few things here that I also need and, is, in fact, why I selected Nutch. Nutch does some things

RE: Example in Java Please

2008-11-12 Thread Lukas, Ray
. It is a good crawl example, with some comments, and clear enough (I think). It is the code used when using nutch from command line. I hope this help. 2008/11/10 Lukas, Ray [EMAIL PROTECTED] Thanks Hasan: Forgive me.. First your generosity is greatly appreciated. Please accept my thanks.. I might

Does not locate my urls or filter problem.

2009-02-25 Thread Lukas, Ray
Invalid indexes are generated {newbie question} Please if you could help. I am trying to get Nutch to work from Java. I wish to crawl a web page and generate Lucene indexes and then use the NutchBean to query them. I located an example in the Nutch distribution and have it working, or so I

RE: Does not locate my urls or filter problem.

2009-02-26 Thread Lukas, Ray
dot com/description /property property nameplugin.folders/name value/plugins/value description / /property property namesearcher.dir/name value/crawl.test/value description / /property /configuration -Ursprüngliche Nachricht- Von: Lukas, Ray [mailto:ray.lu...@idearc.com

RE: AW: Does not locate my urls or filter problem.

2009-02-26 Thread Lukas, Ray
You are correct Hum.. In there I have what I believe are the default settings.. # skip file: ftp: and mailto: urls -^(file|ftp|mailto): # skip image and other suffixes we can't yet parse -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r

RE: Does not locate my urls or filter problem.

2009-02-26 Thread Lukas, Ray
-user@lucene.apache.org Subject: Re: Does not locate my urls or filter problem. Hello, It might sound stupid but try to add few spaces and few new lines in your myURLS.txt (it happend few times on different computers both linux and windows) Thanks, Bartosz Lukas, Ray pisze: Thanks for your

RE: Does not locate my urls or filter problem.

2009-02-26 Thread Lukas, Ray
us all down a rat hole .. I will let you know what happens.. Thanks to all.. Bailing out of this burning jet, trading in for a new one.. Learned a bunch, time to take that to a new clean environment.. Thanks guys.. ray -Original Message- From: Lukas, Ray [mailto:ray.lu

RE: Can not get Nutch query to work.. Can you help..

2009-03-06 Thread Lukas, Ray
Nutch query to work.. Can you help.. To begin specify full path to the nutch index. 2009/3/6 Lukas, Ray ray.lu...@idearc.com I am not able to make any nutch query work. I know it is something simple. Could someone take a look at what I am doing.. Here is the code I am using, it is pretty

RE: Can not get Nutch query to work.. Can you help..

2009-03-06 Thread Lukas, Ray
that search directory using a get method off of the config.. -Original Message- From: Andrzej Bialecki [mailto:a...@getopt.org] Sent: Friday, March 06, 2009 9:26 AM To: nutch-user@lucene.apache.org Subject: Re: Can not get Nutch query to work.. Can you help.. Lukas, Ray wrote: Okay.. I did

RE: Can not get Nutch query to work.. Can you help..

2009-03-06 Thread Lukas, Ray
to work.. Can you help.. Lukas, Ray wrote: Thanks man for helping out on this.. Thanks.. Okay Okay.. so Windows is okay.. I do not have much say in what we use here.. so. Which is fine.. I am happy. I have the following directories, directly under my C:\EclipseWorkspaces\nutchTest\outputDir

Hadopp Config Exception in Nutch

2009-03-10 Thread Lukas, Ray
Has anyone seen this.. Do you know the solution.. I will start looking through the hadopp code but if someone has fixed this already I would appreciate knowing.. Thanks guys.. Fri Mar 6 14:48:40 2009 DEBUG main java.io.IOException: config() at

RE: Hadopp Config Exception in Nutch

2009-03-10 Thread Lukas, Ray
nutch-site.xml properly (full path to your crawl dir) Thanks, Bartosz Lukas, Ray pisze: Has anyone seen this.. Do you know the solution.. I will start looking through the hadopp code but if someone has fixed this already I would appreciate knowing.. Thanks guys.. Fri Mar 6 14:48:40 2009

RE: Hadopp Config Exception in Nutch

2009-03-10 Thread Lukas, Ray
or anyway you wan't it. Crawl, nutchBean also. You should try nutch trunk or even rc http://people.apache.org/~siren/nutch-1.0/rc1/nutch-1.0.tar.gz It's to much difference to write here, it's just 10 times better than 0.9 Lukas, Ray pisze: Oh rats.. Sorry.. Early morning here.. Forgot.. Yes, version

RE: Nutch 1.0 Status?

2009-03-16 Thread Lukas, Ray
-Original Message- From: Jim Van Sciver [mailto:jvansci...@gmail.com] Sent: Monday, March 16, 2009 3:42 PM To: nutch-user@lucene.apache.org Subject: Nutch 1.0 Status? I read in the developers email list that Nutch 1.0 has been packaged for release to Apache. Congratulations!! What

Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread Lukas, Ray
I have some basic questions about Nutch. Can someone point me in the right direction, or if you have time, maybe just blast out an answer. Question One: I can see the terms that come from the web page. Can I set up a way to also add these things to the index. In other words, if ice cream came

RE: ebook resources - including lucene in action

2009-04-21 Thread Lukas, Ray
Erik is right!! We should bane together and bring legal action against these dirtballs.. Do you like it when someone steals your work, takes credit for it, and turns a profit off of it. More than giving their lives to write this content, they are also contributors to the very software that we use,

Hadoop thread seems to remain alive

2009-04-22 Thread Lukas, Ray
I am hoping to write up an article on my project and all the cool things that I figured out about nutch and java and eclipse, etc.. I will go into a long and boring dissertation at that point.. For now I will keep it short and sweet... As best I can.. I have eclipse, java 6, nutch, hadoop running

RE: Hadoop thread seems to remain alive

2009-04-23 Thread Lukas, Ray
Question: What is the proper accepted and safe way to shut down nutch (hadoop) after I am done with it? Hadoop.getFileSystem().closeAll() ?? I did try this and no luck. Anyone else having this problem? Thanks guys.. Thanks, if/when I find it I will post it for everyone. Ray

RE: Hadoop thread seems to remain alive

2009-04-23 Thread Lukas, Ray
, files remain locked. I have gone the brutal way and use unlocker.exe but I mean to find out what's going wrong so I will keep posted on this one. -Ray- 2009/4/23 Lukas, Ray ray.lu...@idearc.com Question: What is the proper accepted and safe way to shut down nutch (hadoop) after I am done

RE: Hadoop thread seems to remain alive

2009-04-23 Thread Lukas, Ray
to this for us. ray -Original Message- From: Lukas, Ray [mailto:ray.lu...@idearc.com] Sent: Thursday, April 23, 2009 9:21 AM To: nutch-user@lucene.apache.org Subject: RE: Hadoop thread seems to remain alive Hey Ray.. Great name you have there.. HA.. I don't actually care about deleting

RE: Hadoop thread seems to remain alive

2009-04-23 Thread Lukas, Ray
:35 AM To: nutch-user@lucene.apache.org Subject: Re: Hadoop thread seems to remain alive Lukas, Ray wrote: Hey Ray.. Great name you have there.. HA.. I don't actually care about deleting these files.. That is not the issue.. See I have embedded Nutch in my application. That application calls

Using nutchBean

2009-04-23 Thread Lukas, Ray
Is this correct.. NativeCrawler nativeCrawler = null; NutchBean nutchBean = null; Query nutchQuery = null; Hits nutchHits = null; for (int index=0; index10; index++) { nativeCrawler = new

RE: Using nutchBean

2009-04-23 Thread Lukas, Ray
)); } } this.segUpdater.start(); -- this is the line I am talking about.. Any ideas, has anyone run into this ? -Original Message- From: Lukas, Ray [mailto:ray.lu...@idearc.com] Sent: Thursday, April 23, 2009 4:36 PM To: nutch-user@lucene.apache.org Subject: Using nutchBean

RE: Using nutchBean

2009-04-23 Thread Lukas, Ray
hunt around for that , or.. Maybe someone already knows where that lives.. Maybe?? -Original Message- From: Andrzej Bialecki [mailto:a...@getopt.org] Sent: Thursday, April 23, 2009 5:32 PM To: nutch-user@lucene.apache.org Subject: Re: Using nutchBean Lukas, Ray wrote: I started going

RE: Using nutchBean

2009-04-23 Thread Lukas, Ray
Oh works great now.. Hey thanks guys and Andrzej Bialecki.. I will look into how this can be submitted for everyone to have.. -Original Message- From: Lukas, Ray [mailto:ray.lu...@idearc.com] Sent: Thursday, April 23, 2009 5:45 PM To: nutch-user@lucene.apache.org Subject: RE: Using

RE: Hadoop thread seems to remain alive

2009-04-24 Thread Lukas, Ray
there is no nutchBean.close() being called I will look for it when I have more time for this. -The other Ray- 2009/4/23 Lukas, Ray ray.lu...@idearc.com I'm sorry guys.. I made a mistake.. This is not coming out of hadoop.. This thread is coming out of nutch bean. Sorry.. I should have looked more

RE: Hadoop thread seems to remain alive

2009-04-24 Thread Lukas, Ray
I exit gracefull or not, probably due to some lost threads. Since the servlet uses the same NutchBean looks like a similar issue as yours. Maybe there is no nutchBean.close() being called I will look for it when I have more time for this. -The other Ray- 2009/4/23 Lukas, Ray ray.lu...@idearc.com

RE: Hadoop thread seems to remain alive

2009-04-25 Thread Lukas, Ray
: Hadoop thread seems to remain alive Hey ray, Actually found my problem, I wasn't stopping Tomcat at the right moment the right way... so it kept some threads/locks. If I do it using the Windows proper service... works fine. -Ray- 2009/4/24 Lukas, Ray ray.lu...@idearc.com What does that thread

Re-direct in Nutch does not seem to work

2009-05-04 Thread Lukas, Ray
Re-direct in Nutch 1.0 does not seem to work.. If I point to a url that is re-directed to (the result of a re-direction, everything works great, if I point to the page that is re-directing me to the working one, I get a corrupted index. Can nutch handle re-direction and if so what magic is

RE: Re-direct in Nutch does not seem to work

2009-05-04 Thread Lukas, Ray
/description /property I only want to scan within the domain I requested... Unless that url instantly re-directs me to a different URL and then I want to only use that one. Any thoughts.. Am I understanding this correctly? Ray -Original Message- From: Lukas, Ray [mailto:ray.lu

unsubscribe from nutch-user

2009-12-04 Thread Lukas, Ray
Well three is a charm.. I need to move these to a different email as well.. Please if you could.. Could we also remove this email address as well.. Thanks ray -Original Message- From: M S Ram [mailto:ms...@cse.iitk.ac.in] Sent: Friday, December 04, 2009 10:01 AM To: