tballison commented on PR #772:
URL: https://github.com/apache/nutch/pull/772#issuecomment-1722508915

   Fantastic! Thank you so much Sebastian!
   
   On Sun, Sep 17, 2023 at 9:02 AM Sebastian Nagel ***@***.***>
   wrote:
   
   > +1
   >
   > A test with the pseudo-distributed Hadoop setup
   > <https://github.com/sebastian-nagel/nutch-test-single-node-cluster/> was
   > successful:
   >
   >    - Nutch tools work properly, no issues
   >    - as expected, Hadoop puts slf4j-api-1.7.36.jar and
   >    slf4j-reload4j-1.7.36.jar in the classpath in front of the Nutch job 
jars
   >    - consequently, task logs are formatted using the format defined in
   >    $HADOOP_HOMe/etc/hadoop/log4j.properties
   >    - (the good thing) log messages from Nutch classes appear in the task
   >    logs, e.g.
   >
   >     2023-09-17 07:29:21,726 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 33 fetching 
https://nutch.apache.org/ (queue crawl delay=5000ms)
   >
   >    - the log format defined in $NUTCH_HOME/conf/log4j2.xml is only
   >    applied to the logs of the Yarn job client, e.g.
   >
   >    2023-09-17 07:29:32,432 INFO fetcher.Fetcher: Fetcher: finished at 
2023-09-17 07:29:32, elapsed: 00:00:25
   >
   >    - in addition, I've included two PDFs, a XLSX and a ePub document, to
   >    test the Tika parser: the docs were successfully parsed using Tika 
2.3.0 -
   >    if necessary I can repeat the test for NUTCH-2959
   >    <https://issues.apache.org/jira/browse/NUTCH-2959>
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/nutch/pull/772#issuecomment-1722472438>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/ABTNNPTYVXO7AZOVVC4NNYTX23YGLANCNFSM6AAAAAA4GB45VU>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to