[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766078#comment-17766078
]
ASF GitHub Bot commented on NUTCH-2978:
---------------------------------------
sebastian-nagel commented on PR #772:
URL: https://github.com/apache/nutch/pull/772#issuecomment-1722472438
+1
A test with the [pseudo-distributed Hadoop
setup](https://github.com/sebastian-nagel/nutch-test-single-node-cluster/) was
successful:
- Nutch tools work properly, no issues
- as expected, Hadoop puts slf4j-api-1.7.36.jar and
slf4j-reload4j-1.7.36.jar in the classpath in front of the Nutch job jars
- consequently, task logs are formatted using the format defined in
`$HADOOP_HOMe/etc/hadoop/log4j.properties`
- (the good thing) log messages from Nutch classes appear in the task logs,
e.g.
```
2023-09-17 07:29:21,726 INFO [FetcherThread]
org.apache.nutch.fetcher.FetcherThread: FetcherThread 33 fetching
https://nutch.apache.org/ (queue crawl delay=5000ms)
```
- the log format defined in `$NUTCH_HOME/conf/log4j2.xml` is only applied to
the logs of the Yarn job client, e.g.
```
2023-09-17 07:29:32,432 INFO fetcher.Fetcher: Fetcher: finished at
2023-09-17 07:29:32, elapsed: 00:00:25
```
- in addition, I've included two PDFs, a XLSX and a ePub document, to test
the Tika parser: the docs were successfully parsed using Tika 2.3.0 - if
necessary I can repeat the test for NUTCH-2959
> Move to slf4j2 and remove log4j1 and reload4j
> ---------------------------------------------
>
> Key: NUTCH-2978
> URL: https://issues.apache.org/jira/browse/NUTCH-2978
> Project: Nutch
> Issue Type: Task
> Reporter: Markus Jelsma
> Priority: Major
> Attachments: NUTCH-2978-1.patch, NUTCH-2978-2.patch,
> NUTCH-2978-3.patch, NUTCH-2978-any23.patch, NUTCH-2978.patch
>
>
> I got in trouble upgrading some dependencies and got a lot of LinkageErrors
> today, or with a Tika upgrade, disappearing logs. This patch fixes that by
> moving to slf4j2, using the corrent log4j2-slfj4-impl2 and getting rid of old
> log4j -> reload4j.
>
> This patch fixes it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)