Re: Re:Re: New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-05 Thread glumet
Thanks for you reply. Unfortunately, I have to write that it did not help :(. -- View this message in context: http://lucene.472066.n3.nabble.com/New-script-bin-crawl-skipping-urls-different-batch-id--Y-tp4075441p4075665.html Sent from the Nutch - User mailing list archive

Re:Re: Re:Re: New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-05 Thread RS
this message in context: http://lucene.472066.n3.nabble.com/New-script-bin-crawl-skipping-urls-different-batch-id--Y-tp4075441p4075665.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Re:Re: Re:Re: New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-05 Thread glumet
-skipping-urls-different-batch-id--Y-tp4075441p4075805.html Sent from the Nutch - User mailing list archive at Nabble.com.

New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-04 Thread glumet
Hello everybody, I am trying to crawl a few websites from my seed.txt with Nutch 2.1 new crawl script bin/crawl. The problem is that everytime I run my script, it does not fetch or parse anything (no urls) with message Skipping [/here is concrete url/] different batch id ([/here is some batch id

Re: New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-04 Thread glumet
I forgot to say that I am using Nutch in version 2.1 ... -- View this message in context: http://lucene.472066.n3.nabble.com/New-script-bin-crawl-skipping-urls-different-batch-id--Y-tp4075441p4075443.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-04 Thread glumet
Ok, as I have written, the problem was in an old version of nutch (2.1). After updating to 2.2.1 the message about different batch id disabled but I have a new problem now. Everytime I start the script bin/crawl it fetch only the urls from seed (no pages) fetching http://www.museumhetvalkhof.nl

Re:Re: New script bin/crawl - skipping urls different batch id (XXXXXXXX-YYYYYYYYY)

2013-07-04 Thread RS
:32:36,glumet jan.bouch...@gmail.com wrote: Ok, as I have written, the problem was in an old version of nutch (2.1). After updating to 2.2.1 the message about different batch id disabled but I have a new problem now. Everytime I start the script bin/crawl it fetch only the urls from seed (no pages

Re: Nutch 2.1 different batch id (null)

2013-04-30 Thread Lewis John Mcgibbney
On Sun, Apr 28, 2013 at 8:33 AM, cervenkovab cervenko...@gmail.com wrote: Hallo, I have the same problem with *Skipping some.relevant.page.com; different batch id (null)* for a lot of pages. My configuration is almost the same as bellow (only different OS and storage is Hbase). I do the steps

Re: Nutch 2.1 different batch id (null)

2013-04-30 Thread Lewis John Mcgibbney
the same problem with *Skipping some.relevant.page.com; different batch id (null)* for a lot of pages. My configuration is almost the same as bellow (only different OS and storage is Hbase). I do the steps (inject), generate, fetch, and the skipping appears in parse phase. But I want those pages

Re: [nutch 2.1 with mysql] different batch id (null)

2013-04-26 Thread Lewis John Mcgibbney
- inject - fetch The second inject will leave entries in the db without fetchmarks seen by the fetcher later. --Roland On Fri, Apr 26, 2013 at 12:30 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Additionally, why do we log.DEBUG that there is a different batch id ( + mark

Re: [nutch 2.1 with mysql] different batch id (null)

2013-04-26 Thread Lewis John Mcgibbney
- generate - inject - fetch The second inject will leave entries in the db without fetchmarks seen by the fetcher later. --Roland On Fri, Apr 26, 2013 at 12:30 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Additionally, why do we log.DEBUG that there is a different batch id

Re: [nutch 2.1 with mysql] different batch id (null)

2013-04-25 Thread Lewis John Mcgibbney
(Reparsing + unreverseKey); } else { if (!NutchJob.shouldProcess(mark, batchId)) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping + TableUtil.unreverseUrl(key) + ; different batch id ( + mark + )); } return; Any ideas? Is this a bug? On Thu

Re: [nutch 2.1 with mysql] different batch id (null)

2013-04-24 Thread Lewis John Mcgibbney
with baseUrl=null, content=null. Nutch not parsing, many url. I receive this message in Nutch console: Skipping http://myurlForParsing.it; different batch id (null) How can I fix? This is actually something which I've wondered about for a while and it was on my TODO list of things to address!!! I want

Re: Nutch 2.1 different batch id (null)

2013-02-15 Thread Lewis John Mcgibbney
://nlp.solutions.asia/?p=180. I made same changes in conf/nutch-site.xml (set threads to 50). When I start crawl (path: ~/Desktop/apache-nutch-2.1/runtime/local, command: bin/nutch crawl urls -depth 5 -topN 1) I saw the message: Skipping http://www.domainname.com/category/viewvideo/111; different batch

Re: Different batch id

2012-08-03 Thread Ferdy Galema
(mark, batchId)) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping + TableUtil.unreverseUrl(key) + ; different batch id ( + mark + )); } return; } since shouldProcess(mark, batchId) returns false if mark is null. Then bin/nutch parse -all skips all

Re: Different batch id

2012-08-02 Thread Bai Shen
/nutch generate -topN 1000 bin/nutch fetch -all bin/nutch parse -all When looking at the parse log, I'm seeing a bunch of different batch id messages. These are all on urls that I did not inject into the database. Any ideas what's causing this? Thanks.

Re: Different batch id

2012-08-02 Thread alxsss
); if (!NutchJob.shouldProcess(mark, batchId)) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping + TableUtil.unreverseUrl(key) + ; different batch id ( + mark + )); } return; } since shouldProcess(mark, batchId) returns false if mark is null. Then bin/nutch parse -all skips all

Re: Different batch id

2012-07-31 Thread Bai Shen
Shen baishen.li...@gmail.com wrote: I set up Nutch 2.x with a new instance of HBase. I ran the following commands. bin/nutch inject urls bin/nutch generate -topN 1000 bin/nutch fetch -all bin/nutch parse -all When looking at the parse log, I'm seeing a bunch of different batch id

Re: Different batch id

2012-07-31 Thread alxsss
am Subject: Re: Different batch id Is there a specific place it's located? I turned on debugging, but I'm not seeing a batch id. On Mon, Jul 30, 2012 at 1:14 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Can you stick on debug logging and see what the batch ID's actually

Re: Different batch id

2012-07-31 Thread Bai Shen
Nope. I ran exactly the listed commands. And like I said, the ones that show a different batch id were urls that I didn't inject. So no idea how they got in there. On Tue, Jul 31, 2012 at 1:44 PM, alx...@aim.com wrote: Hi, Most likely you run generate command a few times and did not run