Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-09-02 Thread Sebastian Nagel
Hi Markus,

thanks!

Could you share the files in

  .ivy2/cache/org.apache.httpcomponents/httpasyncclient/

and maybe also the logs of a Nutch build starting with an empty ~/.ivy2/cache ?
I'll have a look and compare it what I find on my system - maybe use a new
thread on user@ or a Jira issue, I'll plan to close the vote over the weekend,
so let's keep this thread for the release vote alone.

Best,
Sebastian

On 8/29/22 14:17, Markus Jelsma wrote:
> Hello Sebastian,
> 
> No, the JAR isn't present. Multiple JARs are missing, probably because they
> are loaded after httpasyncclient. I checked the previously emptied Ivy
> cache. The Ivy files are there, but the JAR is missing there too.
> 
> markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
> 
> I manually downloaded the JAR from [1] and added it to the jars/ directory
> in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> some more things than just adding the JAR manually.
> 
> The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> file in the cache.
> 
> Since Ralf can compile it without problems, it seems to be an issue on my
> machine only. So Nutch seems fine, therefore +1.
> 
> Regards,
> Markus
> 
> [1]
> https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
> 
> 
> Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> :
> 
>> Hi Ralf,
>>
>>> It fetches it parses
>>
>> So a +1 ?
>>
>> Best,
>> Sebastian
>>
>> On 8/25/22 05:22, BlackIce wrote:
>>> nevermind I made a typo...
>>>
>>> It fetches it parses
>>>
>>> On Thu, Aug 25, 2022 at 3:42 AM BlackIce  wrote:

 so far... it doesn't select anything when creating segments:
 0 records selected for fetching, exiting

 On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
>
> I have been able to compile under OpenJDK 11
> Have not done anything further so far
> I'm gonna try to get to it this evening
>
> Greetz
> Ralf
>
> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
>  wrote:
>>
>> Hi,
>>
>> Everything seems fine, the crawler seems fine when trying the binary
>> distribution. The source won't work because this computer still cannot
>> compile it. Clearing the local Ivy cache did not do much. This is the
>> known
>> compiler error with the elastic-indexer plugin:
>> compile:
>> [echo] Compiling plugin: indexer-elastic
>>[javac] Compiling 3 source files to
>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>>[javac]
>>
>> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
>> error: package org.apache.http.impl.nio.client does not exist
>>[javac] import
>> org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>>[javac]   ^
>>[javac] 1 error
>>
>>
>> The binary distribution works fine though. I do see a lot of new
>> messages
>> when fetching:
>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
>> [LocalJobRunner
>> Map Task Executor #0] Found 0 extensions at
>> point:'org.apache.nutch.net.URLExemptionFilter'
>>
>> This is also new at start of each task:
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>
>> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>
>> SLF4J: Found binding in
>>
>> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual binding is of type
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>>
>> And this one at the end of fetcher:
>> log4j:WARN No appenders could be found for logger
>> (org.apache.commons.httpclient.params.DefaultHttpParams).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
>> for
>> more info.
>>
>> I am worried about the indexer-elastic plugin, maybe others have that
>> problem too? Otherwise everything seems fine.
>>
>> Markus
>>
>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
>> sna...@apache.org>:
>>
>>> Hi Folks,
>>>
>>> A first candidate for the Nutch 1.19 release is available at:
>>>
>>>https://dist.apache.org/repos/dist/dev/nutch/1.19/
>>>
>>> The release candidate is a zip and tar.gz archive of the binary and
>>> sources in:
>>>https://github.com/apache/nutch/tree/release-1.19
>>>
>>> In addition, a staged maven repository is available here:
>>>
>> 

Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-31 Thread Jorge Betancourt
Hi all,

Compiled from the sources (JDK11) and ran a small crawl and indexing (to
Solr) both passed with flying colors.

That's a +1 from me. Great work Sebastian!

On Mon, Aug 22, 2022 at 5:30 PM Sebastian Nagel  wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.19 release is available at:
>
>https://dist.apache.org/repos/dist/dev/nutch/1.19/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>https://github.com/apache/nutch/tree/release-1.19
>
> In addition, a staged maven repository is available here:
>https://repository.apache.org/content/repositories/orgapachenutch-1020
>
> We addressed 87 issues:
>https://s.apache.org/lf6li
>
>
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
>
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>


Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-30 Thread BlackIce
OK,
I compiled Nutch under JDK11
Did some basic fetching, parsing, linkinversion and posterior indexing to Solr 9
[+1]

Great work!
RRK

On Tue, Aug 30, 2022 at 12:22 PM BlackIce  wrote:
>
> Tried some indexing... but when manually doing "Invertilinks" it says
> something about input path does not exist.
> Has invertilinks changed since 1.18?
>
> Greetz
> RRK
>
> On Mon, Aug 29, 2022 at 3:38 PM BlackIce  wrote:
> >
> > Haven't indexed anything to solr.. gonna give it a shot in a few hours
> >
> > On Mon, Aug 29, 2022 at 2:17 PM Markus Jelsma
> >  wrote:
> > >
> > > Hello Sebastian,
> > >
> > > No, the JAR isn't present. Multiple JARs are missing, probably because 
> > > they
> > > are loaded after httpasyncclient. I checked the previously emptied Ivy
> > > cache. The Ivy files are there, but the JAR is missing there too.
> > >
> > > markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> > > ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
> > >
> > > I manually downloaded the JAR from [1] and added it to the jars/ directory
> > > in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache 
> > > needs
> > > some more things than just adding the JAR manually.
> > >
> > > The odd thing is, that i got the URL below FROM the 
> > > ivydata-4.1.4.properties
> > > file in the cache.
> > >
> > > Since Ralf can compile it without problems, it seems to be an issue on my
> > > machine only. So Nutch seems fine, therefore +1.
> > >
> > > Regards,
> > > Markus
> > >
> > > [1]
> > > https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
> > >
> > >
> > > Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> > > :
> > >
> > > > Hi Ralf,
> > > >
> > > > > It fetches it parses
> > > >
> > > > So a +1 ?
> > > >
> > > > Best,
> > > > Sebastian
> > > >
> > > > On 8/25/22 05:22, BlackIce wrote:
> > > > > nevermind I made a typo...
> > > > >
> > > > > It fetches it parses
> > > > >
> > > > > On Thu, Aug 25, 2022 at 3:42 AM BlackIce  
> > > > > wrote:
> > > > >>
> > > > >> so far... it doesn't select anything when creating segments:
> > > > >> 0 records selected for fetching, exiting
> > > > >>
> > > > >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce  
> > > > >> wrote:
> > > > >>>
> > > > >>> I have been able to compile under OpenJDK 11
> > > > >>> Have not done anything further so far
> > > > >>> I'm gonna try to get to it this evening
> > > > >>>
> > > > >>> Greetz
> > > > >>> Ralf
> > > > >>>
> > > > >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > > > >>>  wrote:
> > > > 
> > > >  Hi,
> > > > 
> > > >  Everything seems fine, the crawler seems fine when trying the 
> > > >  binary
> > > >  distribution. The source won't work because this computer still 
> > > >  cannot
> > > >  compile it. Clearing the local Ivy cache did not do much. This is 
> > > >  the
> > > > known
> > > >  compiler error with the elastic-indexer plugin:
> > > >  compile:
> > > >  [echo] Compiling plugin: indexer-elastic
> > > > [javac] Compiling 3 source files to
> > > >  /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > > > [javac]
> > > > 
> > > > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > > >  error: package org.apache.http.impl.nio.client does not exist
> > > > [javac] import
> > > > org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > > > [javac]   ^
> > > > [javac] 1 error
> > > > 
> > > > 
> > > >  The binary distribution works fine though. I do see a lot of new
> > > > messages
> > > >  when fetching:
> > > >  2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> > > > [LocalJobRunner
> > > >  Map Task Executor #0] Found 0 extensions at
> > > >  point:'org.apache.nutch.net.URLExemptionFilter'
> > > > 
> > > >  This is also new at start of each task:
> > > >  SLF4J: Class path contains multiple SLF4J bindings.
> > > >  SLF4J: Found binding in
> > > > 
> > > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > 
> > > >  SLF4J: Found binding in
> > > > 
> > > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > 
> > > >  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > >  explanation.
> > > >  SLF4J: Actual binding is of type
> > > >  [org.apache.logging.slf4j.Log4jLoggerFactory]
> > > > 
> > > >  And this one at the end of fetcher:
> > > >  log4j:WARN No appenders could be found for logger
> > > >  (org.apache.commons.httpclient.params.DefaultHttpParams).
> > > >  log4j:WARN Please initialize the log4j system properly.
> > > >  

Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-30 Thread BlackIce
Tried some indexing... but when manually doing "Invertilinks" it says
something about input path does not exist.
Has invertilinks changed since 1.18?

Greetz
RRK

On Mon, Aug 29, 2022 at 3:38 PM BlackIce  wrote:
>
> Haven't indexed anything to solr.. gonna give it a shot in a few hours
>
> On Mon, Aug 29, 2022 at 2:17 PM Markus Jelsma
>  wrote:
> >
> > Hello Sebastian,
> >
> > No, the JAR isn't present. Multiple JARs are missing, probably because they
> > are loaded after httpasyncclient. I checked the previously emptied Ivy
> > cache. The Ivy files are there, but the JAR is missing there too.
> >
> > markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> > ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
> >
> > I manually downloaded the JAR from [1] and added it to the jars/ directory
> > in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> > some more things than just adding the JAR manually.
> >
> > The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> > file in the cache.
> >
> > Since Ralf can compile it without problems, it seems to be an issue on my
> > machine only. So Nutch seems fine, therefore +1.
> >
> > Regards,
> > Markus
> >
> > [1]
> > https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
> >
> >
> > Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> > :
> >
> > > Hi Ralf,
> > >
> > > > It fetches it parses
> > >
> > > So a +1 ?
> > >
> > > Best,
> > > Sebastian
> > >
> > > On 8/25/22 05:22, BlackIce wrote:
> > > > nevermind I made a typo...
> > > >
> > > > It fetches it parses
> > > >
> > > > On Thu, Aug 25, 2022 at 3:42 AM BlackIce  wrote:
> > > >>
> > > >> so far... it doesn't select anything when creating segments:
> > > >> 0 records selected for fetching, exiting
> > > >>
> > > >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
> > > >>>
> > > >>> I have been able to compile under OpenJDK 11
> > > >>> Have not done anything further so far
> > > >>> I'm gonna try to get to it this evening
> > > >>>
> > > >>> Greetz
> > > >>> Ralf
> > > >>>
> > > >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > > >>>  wrote:
> > > 
> > >  Hi,
> > > 
> > >  Everything seems fine, the crawler seems fine when trying the binary
> > >  distribution. The source won't work because this computer still 
> > >  cannot
> > >  compile it. Clearing the local Ivy cache did not do much. This is the
> > > known
> > >  compiler error with the elastic-indexer plugin:
> > >  compile:
> > >  [echo] Compiling plugin: indexer-elastic
> > > [javac] Compiling 3 source files to
> > >  /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > > [javac]
> > > 
> > > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > >  error: package org.apache.http.impl.nio.client does not exist
> > > [javac] import
> > > org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > > [javac]   ^
> > > [javac] 1 error
> > > 
> > > 
> > >  The binary distribution works fine though. I do see a lot of new
> > > messages
> > >  when fetching:
> > >  2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> > > [LocalJobRunner
> > >  Map Task Executor #0] Found 0 extensions at
> > >  point:'org.apache.nutch.net.URLExemptionFilter'
> > > 
> > >  This is also new at start of each task:
> > >  SLF4J: Class path contains multiple SLF4J bindings.
> > >  SLF4J: Found binding in
> > > 
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > 
> > >  SLF4J: Found binding in
> > > 
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > 
> > >  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > >  explanation.
> > >  SLF4J: Actual binding is of type
> > >  [org.apache.logging.slf4j.Log4jLoggerFactory]
> > > 
> > >  And this one at the end of fetcher:
> > >  log4j:WARN No appenders could be found for logger
> > >  (org.apache.commons.httpclient.params.DefaultHttpParams).
> > >  log4j:WARN Please initialize the log4j system properly.
> > >  log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > > for
> > >  more info.
> > > 
> > >  I am worried about the indexer-elastic plugin, maybe others have that
> > >  problem too? Otherwise everything seems fine.
> > > 
> > >  Markus
> > > 
> > >  Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> > > sna...@apache.org>:
> > > 
> > > > Hi Folks,
> > > >
> > > > A first candidate for the Nutch 1.19 release is available at:
> > > >
> 

Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-29 Thread BlackIce
Haven't indexed anything to solr.. gonna give it a shot in a few hours

On Mon, Aug 29, 2022 at 2:17 PM Markus Jelsma
 wrote:
>
> Hello Sebastian,
>
> No, the JAR isn't present. Multiple JARs are missing, probably because they
> are loaded after httpasyncclient. I checked the previously emptied Ivy
> cache. The Ivy files are there, but the JAR is missing there too.
>
> markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
>
> I manually downloaded the JAR from [1] and added it to the jars/ directory
> in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> some more things than just adding the JAR manually.
>
> The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> file in the cache.
>
> Since Ralf can compile it without problems, it seems to be an issue on my
> machine only. So Nutch seems fine, therefore +1.
>
> Regards,
> Markus
>
> [1]
> https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
>
>
> Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> :
>
> > Hi Ralf,
> >
> > > It fetches it parses
> >
> > So a +1 ?
> >
> > Best,
> > Sebastian
> >
> > On 8/25/22 05:22, BlackIce wrote:
> > > nevermind I made a typo...
> > >
> > > It fetches it parses
> > >
> > > On Thu, Aug 25, 2022 at 3:42 AM BlackIce  wrote:
> > >>
> > >> so far... it doesn't select anything when creating segments:
> > >> 0 records selected for fetching, exiting
> > >>
> > >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
> > >>>
> > >>> I have been able to compile under OpenJDK 11
> > >>> Have not done anything further so far
> > >>> I'm gonna try to get to it this evening
> > >>>
> > >>> Greetz
> > >>> Ralf
> > >>>
> > >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > >>>  wrote:
> > 
> >  Hi,
> > 
> >  Everything seems fine, the crawler seems fine when trying the binary
> >  distribution. The source won't work because this computer still cannot
> >  compile it. Clearing the local Ivy cache did not do much. This is the
> > known
> >  compiler error with the elastic-indexer plugin:
> >  compile:
> >  [echo] Compiling plugin: indexer-elastic
> > [javac] Compiling 3 source files to
> >  /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > [javac]
> > 
> > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> >  error: package org.apache.http.impl.nio.client does not exist
> > [javac] import
> > org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > [javac]   ^
> > [javac] 1 error
> > 
> > 
> >  The binary distribution works fine though. I do see a lot of new
> > messages
> >  when fetching:
> >  2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> > [LocalJobRunner
> >  Map Task Executor #0] Found 0 extensions at
> >  point:'org.apache.nutch.net.URLExemptionFilter'
> > 
> >  This is also new at start of each task:
> >  SLF4J: Class path contains multiple SLF4J bindings.
> >  SLF4J: Found binding in
> > 
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > 
> >  SLF4J: Found binding in
> > 
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > 
> >  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> >  explanation.
> >  SLF4J: Actual binding is of type
> >  [org.apache.logging.slf4j.Log4jLoggerFactory]
> > 
> >  And this one at the end of fetcher:
> >  log4j:WARN No appenders could be found for logger
> >  (org.apache.commons.httpclient.params.DefaultHttpParams).
> >  log4j:WARN Please initialize the log4j system properly.
> >  log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > for
> >  more info.
> > 
> >  I am worried about the indexer-elastic plugin, maybe others have that
> >  problem too? Otherwise everything seems fine.
> > 
> >  Markus
> > 
> >  Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> > sna...@apache.org>:
> > 
> > > Hi Folks,
> > >
> > > A first candidate for the Nutch 1.19 release is available at:
> > >
> > >https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > >
> > > The release candidate is a zip and tar.gz archive of the binary and
> > > sources in:
> > >https://github.com/apache/nutch/tree/release-1.19
> > >
> > > In addition, a staged maven repository is available here:
> > >
> > https://repository.apache.org/content/repositories/orgapachenutch-1020
> > >
> > > We addressed 87 issues:
> > >https://s.apache.org/lf6li

Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-29 Thread Markus Jelsma
Hello Sebastian,

No, the JAR isn't present. Multiple JARs are missing, probably because they
are loaded after httpasyncclient. I checked the previously emptied Ivy
cache. The Ivy files are there, but the JAR is missing there too.

markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties

I manually downloaded the JAR from [1] and added it to the jars/ directory
in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
some more things than just adding the JAR manually.

The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
file in the cache.

Since Ralf can compile it without problems, it seems to be an issue on my
machine only. So Nutch seems fine, therefore +1.

Regards,
Markus

[1]
https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/


Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
:

> Hi Ralf,
>
> > It fetches it parses
>
> So a +1 ?
>
> Best,
> Sebastian
>
> On 8/25/22 05:22, BlackIce wrote:
> > nevermind I made a typo...
> >
> > It fetches it parses
> >
> > On Thu, Aug 25, 2022 at 3:42 AM BlackIce  wrote:
> >>
> >> so far... it doesn't select anything when creating segments:
> >> 0 records selected for fetching, exiting
> >>
> >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
> >>>
> >>> I have been able to compile under OpenJDK 11
> >>> Have not done anything further so far
> >>> I'm gonna try to get to it this evening
> >>>
> >>> Greetz
> >>> Ralf
> >>>
> >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> >>>  wrote:
> 
>  Hi,
> 
>  Everything seems fine, the crawler seems fine when trying the binary
>  distribution. The source won't work because this computer still cannot
>  compile it. Clearing the local Ivy cache did not do much. This is the
> known
>  compiler error with the elastic-indexer plugin:
>  compile:
>  [echo] Compiling plugin: indexer-elastic
> [javac] Compiling 3 source files to
>  /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> [javac]
> 
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
>  error: package org.apache.http.impl.nio.client does not exist
> [javac] import
> org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> [javac]   ^
> [javac] 1 error
> 
> 
>  The binary distribution works fine though. I do see a lot of new
> messages
>  when fetching:
>  2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> [LocalJobRunner
>  Map Task Executor #0] Found 0 extensions at
>  point:'org.apache.nutch.net.URLExemptionFilter'
> 
>  This is also new at start of each task:
>  SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in
> 
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>  SLF4J: Found binding in
> 
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
>  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>  explanation.
>  SLF4J: Actual binding is of type
>  [org.apache.logging.slf4j.Log4jLoggerFactory]
> 
>  And this one at the end of fetcher:
>  log4j:WARN No appenders could be found for logger
>  (org.apache.commons.httpclient.params.DefaultHttpParams).
>  log4j:WARN Please initialize the log4j system properly.
>  log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> for
>  more info.
> 
>  I am worried about the indexer-elastic plugin, maybe others have that
>  problem too? Otherwise everything seems fine.
> 
>  Markus
> 
>  Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> sna...@apache.org>:
> 
> > Hi Folks,
> >
> > A first candidate for the Nutch 1.19 release is available at:
> >
> >https://dist.apache.org/repos/dist/dev/nutch/1.19/
> >
> > The release candidate is a zip and tar.gz archive of the binary and
> > sources in:
> >https://github.com/apache/nutch/tree/release-1.19
> >
> > In addition, a staged maven repository is available here:
> >
> https://repository.apache.org/content/repositories/orgapachenutch-1020
> >
> > We addressed 87 issues:
> >https://s.apache.org/lf6li
> >
> >
> > Please vote on releasing this package as Apache Nutch 1.19.
> > The vote is open for the next 72 hours and passes if a majority
> > of at least three +1 Nutch PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Nutch 1.19.
> > [ ] -1 Do not release this package because…
> >
> > Cheers,
> > Sebastian
> > (On behalf 

Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-28 Thread Sebastian Nagel
Hi Ralf,

> It fetches it parses

So a +1 ?

Best,
Sebastian

On 8/25/22 05:22, BlackIce wrote:
> nevermind I made a typo...
> 
> It fetches it parses
> 
> On Thu, Aug 25, 2022 at 3:42 AM BlackIce  wrote:
>>
>> so far... it doesn't select anything when creating segments:
>> 0 records selected for fetching, exiting
>>
>> On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
>>>
>>> I have been able to compile under OpenJDK 11
>>> Have not done anything further so far
>>> I'm gonna try to get to it this evening
>>>
>>> Greetz
>>> Ralf
>>>
>>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
>>>  wrote:

 Hi,

 Everything seems fine, the crawler seems fine when trying the binary
 distribution. The source won't work because this computer still cannot
 compile it. Clearing the local Ivy cache did not do much. This is the known
 compiler error with the elastic-indexer plugin:
 compile:
 [echo] Compiling plugin: indexer-elastic
[javac] Compiling 3 source files to
 /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
[javac]
 /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
 error: package org.apache.http.impl.nio.client does not exist
[javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
[javac]   ^
[javac] 1 error


 The binary distribution works fine though. I do see a lot of new messages
 when fetching:
 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
 Map Task Executor #0] Found 0 extensions at
 point:'org.apache.nutch.net.URLExemptionFilter'

 This is also new at start of each task:
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in
 [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

 SLF4J: Found binding in
 [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]

 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
 explanation.
 SLF4J: Actual binding is of type
 [org.apache.logging.slf4j.Log4jLoggerFactory]

 And this one at the end of fetcher:
 log4j:WARN No appenders could be found for logger
 (org.apache.commons.httpclient.params.DefaultHttpParams).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.

 I am worried about the indexer-elastic plugin, maybe others have that
 problem too? Otherwise everything seems fine.

 Markus

 Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel :

> Hi Folks,
>
> A first candidate for the Nutch 1.19 release is available at:
>
>https://dist.apache.org/repos/dist/dev/nutch/1.19/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>https://github.com/apache/nutch/tree/release-1.19
>
> In addition, a staged maven repository is available here:
>https://repository.apache.org/content/repositories/orgapachenutch-1020
>
> We addressed 87 issues:
>https://s.apache.org/lf6li
>
>
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
>
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>


Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-28 Thread Sebastian Nagel
Hi Markus,

thanks!  What's your (final) decision?


>[javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;

During build the class should be provided in
  build/plugins/indexer-elastic/httpasyncclient-4.1.4.jar
Could you verify whether this jar is there and whether it contains the class
file? See also:
  
https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/

> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.

In order to fix it, we need to make the error reproducible resp. figure out
what the reason is.


Regarding the logging: we switched to log4j 2.x (NUTCH-2915) while Hadoop now
uses reload4j (HADOOP-18088 [1]). The logging configuration should be improved
to avoid the warnings in local mode. In distributed mode, the logging
configuration of the provided Hadoop takes over.


Best,
Sebastian

[1] https://issues.apache.org/jira/browse/HADOOP-18088


On 8/24/22 13:28, Markus Jelsma wrote:
> Hi,
> 
> Everything seems fine, the crawler seems fine when trying the binary
> distribution. The source won't work because this computer still cannot
> compile it. Clearing the local Ivy cache did not do much. This is the known
> compiler error with the elastic-indexer plugin:
> compile:
> [echo] Compiling plugin: indexer-elastic
>[javac] Compiling 3 source files to
> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>[javac]
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> error: package org.apache.http.impl.nio.client does not exist
>[javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>[javac]   ^
>[javac] 1 error
> 
> 
> The binary distribution works fine though. I do see a lot of new messages
> when fetching:
> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> Map Task Executor #0] Found 0 extensions at
> point:'org.apache.nutch.net.URLExemptionFilter'
> 
> This is also new at start of each task:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> 
> And this one at the end of fetcher:
> log4j:WARN No appenders could be found for logger
> (org.apache.commons.httpclient.params.DefaultHttpParams).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> 
> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.
> 
> Markus
> 
> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel :
> 
>> Hi Folks,
>>
>> A first candidate for the Nutch 1.19 release is available at:
>>
>>https://dist.apache.org/repos/dist/dev/nutch/1.19/
>>
>> The release candidate is a zip and tar.gz archive of the binary and
>> sources in:
>>https://github.com/apache/nutch/tree/release-1.19
>>
>> In addition, a staged maven repository is available here:
>>https://repository.apache.org/content/repositories/orgapachenutch-1020
>>
>> We addressed 87 issues:
>>https://s.apache.org/lf6li
>>
>>
>> Please vote on releasing this package as Apache Nutch 1.19.
>> The vote is open for the next 72 hours and passes if a majority
>> of at least three +1 Nutch PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Nutch 1.19.
>> [ ] -1 Do not release this package because…
>>
>> Cheers,
>> Sebastian
>> (On behalf of the Nutch PMC)
>>
>> P.S.
>> Here is my +1.
>> - tested most of Nutch tools and run a test crawl on a single-node cluster
>>   running Hadoop 3.3.4, see
>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>>
> 


Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-24 Thread BlackIce
nevermind I made a typo...

It fetches it parses

On Thu, Aug 25, 2022 at 3:42 AM BlackIce  wrote:
>
> so far... it doesn't select anything when creating segments:
> 0 records selected for fetching, exiting
>
> On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
> >
> > I have been able to compile under OpenJDK 11
> > Have not done anything further so far
> > I'm gonna try to get to it this evening
> >
> > Greetz
> > Ralf
> >
> > On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> >  wrote:
> > >
> > > Hi,
> > >
> > > Everything seems fine, the crawler seems fine when trying the binary
> > > distribution. The source won't work because this computer still cannot
> > > compile it. Clearing the local Ivy cache did not do much. This is the 
> > > known
> > > compiler error with the elastic-indexer plugin:
> > > compile:
> > > [echo] Compiling plugin: indexer-elastic
> > >[javac] Compiling 3 source files to
> > > /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > >[javac]
> > > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > > error: package org.apache.http.impl.nio.client does not exist
> > >[javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > >[javac]   ^
> > >[javac] 1 error
> > >
> > >
> > > The binary distribution works fine though. I do see a lot of new messages
> > > when fetching:
> > > 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> > > Map Task Executor #0] Found 0 extensions at
> > > point:'org.apache.nutch.net.URLExemptionFilter'
> > >
> > > This is also new at start of each task:
> > > SLF4J: Class path contains multiple SLF4J bindings.
> > > SLF4J: Found binding in
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >
> > > SLF4J: Found binding in
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >
> > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > explanation.
> > > SLF4J: Actual binding is of type
> > > [org.apache.logging.slf4j.Log4jLoggerFactory]
> > >
> > > And this one at the end of fetcher:
> > > log4j:WARN No appenders could be found for logger
> > > (org.apache.commons.httpclient.params.DefaultHttpParams).
> > > log4j:WARN Please initialize the log4j system properly.
> > > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> > > more info.
> > >
> > > I am worried about the indexer-elastic plugin, maybe others have that
> > > problem too? Otherwise everything seems fine.
> > >
> > > Markus
> > >
> > > Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel :
> > >
> > > > Hi Folks,
> > > >
> > > > A first candidate for the Nutch 1.19 release is available at:
> > > >
> > > >https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > > >
> > > > The release candidate is a zip and tar.gz archive of the binary and
> > > > sources in:
> > > >https://github.com/apache/nutch/tree/release-1.19
> > > >
> > > > In addition, a staged maven repository is available here:
> > > >
> > > > https://repository.apache.org/content/repositories/orgapachenutch-1020
> > > >
> > > > We addressed 87 issues:
> > > >https://s.apache.org/lf6li
> > > >
> > > >
> > > > Please vote on releasing this package as Apache Nutch 1.19.
> > > > The vote is open for the next 72 hours and passes if a majority
> > > > of at least three +1 Nutch PMC votes are cast.
> > > >
> > > > [ ] +1 Release this package as Apache Nutch 1.19.
> > > > [ ] -1 Do not release this package because…
> > > >
> > > > Cheers,
> > > > Sebastian
> > > > (On behalf of the Nutch PMC)
> > > >
> > > > P.S.
> > > > Here is my +1.
> > > > - tested most of Nutch tools and run a test crawl on a single-node 
> > > > cluster
> > > >   running Hadoop 3.3.4, see
> > > >   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
> > > >


Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-24 Thread BlackIce
so far... it doesn't select anything when creating segments:
0 records selected for fetching, exiting

On Wed, Aug 24, 2022 at 3:02 PM BlackIce  wrote:
>
> I have been able to compile under OpenJDK 11
> Have not done anything further so far
> I'm gonna try to get to it this evening
>
> Greetz
> Ralf
>
> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
>  wrote:
> >
> > Hi,
> >
> > Everything seems fine, the crawler seems fine when trying the binary
> > distribution. The source won't work because this computer still cannot
> > compile it. Clearing the local Ivy cache did not do much. This is the known
> > compiler error with the elastic-indexer plugin:
> > compile:
> > [echo] Compiling plugin: indexer-elastic
> >[javac] Compiling 3 source files to
> > /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> >[javac]
> > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > error: package org.apache.http.impl.nio.client does not exist
> >[javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> >[javac]   ^
> >[javac] 1 error
> >
> >
> > The binary distribution works fine though. I do see a lot of new messages
> > when fetching:
> > 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> > Map Task Executor #0] Found 0 extensions at
> > point:'org.apache.nutch.net.URLExemptionFilter'
> >
> > This is also new at start of each task:
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >
> > SLF4J: Found binding in
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type
> > [org.apache.logging.slf4j.Log4jLoggerFactory]
> >
> > And this one at the end of fetcher:
> > log4j:WARN No appenders could be found for logger
> > (org.apache.commons.httpclient.params.DefaultHttpParams).
> > log4j:WARN Please initialize the log4j system properly.
> > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> > more info.
> >
> > I am worried about the indexer-elastic plugin, maybe others have that
> > problem too? Otherwise everything seems fine.
> >
> > Markus
> >
> > Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel :
> >
> > > Hi Folks,
> > >
> > > A first candidate for the Nutch 1.19 release is available at:
> > >
> > >https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > >
> > > The release candidate is a zip and tar.gz archive of the binary and
> > > sources in:
> > >https://github.com/apache/nutch/tree/release-1.19
> > >
> > > In addition, a staged maven repository is available here:
> > >https://repository.apache.org/content/repositories/orgapachenutch-1020
> > >
> > > We addressed 87 issues:
> > >https://s.apache.org/lf6li
> > >
> > >
> > > Please vote on releasing this package as Apache Nutch 1.19.
> > > The vote is open for the next 72 hours and passes if a majority
> > > of at least three +1 Nutch PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Nutch 1.19.
> > > [ ] -1 Do not release this package because…
> > >
> > > Cheers,
> > > Sebastian
> > > (On behalf of the Nutch PMC)
> > >
> > > P.S.
> > > Here is my +1.
> > > - tested most of Nutch tools and run a test crawl on a single-node cluster
> > >   running Hadoop 3.3.4, see
> > >   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
> > >


Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-24 Thread BlackIce
I have been able to compile under OpenJDK 11
Have not done anything further so far
I'm gonna try to get to it this evening

Greetz
Ralf

On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
 wrote:
>
> Hi,
>
> Everything seems fine, the crawler seems fine when trying the binary
> distribution. The source won't work because this computer still cannot
> compile it. Clearing the local Ivy cache did not do much. This is the known
> compiler error with the elastic-indexer plugin:
> compile:
> [echo] Compiling plugin: indexer-elastic
>[javac] Compiling 3 source files to
> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>[javac]
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> error: package org.apache.http.impl.nio.client does not exist
>[javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>[javac]   ^
>[javac] 1 error
>
>
> The binary distribution works fine though. I do see a lot of new messages
> when fetching:
> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> Map Task Executor #0] Found 0 extensions at
> point:'org.apache.nutch.net.URLExemptionFilter'
>
> This is also new at start of each task:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>
> And this one at the end of fetcher:
> log4j:WARN No appenders could be found for logger
> (org.apache.commons.httpclient.params.DefaultHttpParams).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.
>
> Markus
>
> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel :
>
> > Hi Folks,
> >
> > A first candidate for the Nutch 1.19 release is available at:
> >
> >https://dist.apache.org/repos/dist/dev/nutch/1.19/
> >
> > The release candidate is a zip and tar.gz archive of the binary and
> > sources in:
> >https://github.com/apache/nutch/tree/release-1.19
> >
> > In addition, a staged maven repository is available here:
> >https://repository.apache.org/content/repositories/orgapachenutch-1020
> >
> > We addressed 87 issues:
> >https://s.apache.org/lf6li
> >
> >
> > Please vote on releasing this package as Apache Nutch 1.19.
> > The vote is open for the next 72 hours and passes if a majority
> > of at least three +1 Nutch PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Nutch 1.19.
> > [ ] -1 Do not release this package because…
> >
> > Cheers,
> > Sebastian
> > (On behalf of the Nutch PMC)
> >
> > P.S.
> > Here is my +1.
> > - tested most of Nutch tools and run a test crawl on a single-node cluster
> >   running Hadoop 3.3.4, see
> >   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
> >


Re: [VOTE] Release Apache Nutch 1.19 RC#1

2022-08-24 Thread Markus Jelsma
Hi,

Everything seems fine, the crawler seems fine when trying the binary
distribution. The source won't work because this computer still cannot
compile it. Clearing the local Ivy cache did not do much. This is the known
compiler error with the elastic-indexer plugin:
compile:
[echo] Compiling plugin: indexer-elastic
   [javac] Compiling 3 source files to
/home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
   [javac]
/home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
error: package org.apache.http.impl.nio.client does not exist
   [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
   [javac]   ^
   [javac] 1 error


The binary distribution works fine though. I do see a lot of new messages
when fetching:
2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
Map Task Executor #0] Found 0 extensions at
point:'org.apache.nutch.net.URLExemptionFilter'

This is also new at start of each task:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in
[jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]

And this one at the end of fetcher:
log4j:WARN No appenders could be found for logger
(org.apache.commons.httpclient.params.DefaultHttpParams).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.

I am worried about the indexer-elastic plugin, maybe others have that
problem too? Otherwise everything seems fine.

Markus

Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel :

> Hi Folks,
>
> A first candidate for the Nutch 1.19 release is available at:
>
>https://dist.apache.org/repos/dist/dev/nutch/1.19/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>https://github.com/apache/nutch/tree/release-1.19
>
> In addition, a staged maven repository is available here:
>https://repository.apache.org/content/repositories/orgapachenutch-1020
>
> We addressed 87 issues:
>https://s.apache.org/lf6li
>
>
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
>
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>