> Migrating javax->jakarta has been quite a chore on Tika because of
> dependencies. Given back-compat issues with hadoop, is this even on the
> horizon for Nutch?
Good point. I think we are pretty free to replace javax packages in Nutch core
and plugins - they're used in multiple classes.
If it's about transitive dependencies of mandatory dependencies such as Hadoop:
well, that's strictly speaking not our job. But there should be no or very few
Nutch classes which rely on javax classes shared with dependencies.
> Y, I'd like to get a working Tika version in a release fairly soon.
Definitely.
> Not sure how much effort a release is?
See https://cwiki.apache.org/confluence/display/NUTCH/Release_HOWTO
Plainly spoken, it's too much effort. And if you take testing seriously,
it's even more, because there are no automated tests to verify that everything
runs well on a Hadoop cluster and to test indexing into Solr, ES, OpenSearch.
On 9/28/23 15:37, Tim Allison wrote:
Sorry for two emails...
Migrating javax->jakarta has been quite a chore on Tika because of dependencies.
Given back-compat issues with hadoop, is this even on the horizon for Nutch?
On Thu, Sep 28, 2023 at 9:29 AM Tim Allison <talli...@apache.org
<mailto:talli...@apache.org>> wrote:
Y, I'd like to get a working Tika version in a release fairly soon. Not sure
how much effort a release is?
On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel <sna...@apache.org
<mailto:sna...@apache.org>> wrote:
Hi Lewis,
thanks!
I'd put on top of the list
* release 1.20
Since the release of 1.19 more than one year has elapsed.
Otherwise I agree with all points on the road map, even
in this order / priority.
Best,
Sebastian
On 9/26/23 18:37, lewis john mcgibbney wrote:
> Hi dev@,
>
> I've been at arms length for a while as $dayjob changed and then
> changed again over the last number of years.
>
> With that being said, I wanted to start a thread on $title with the
> goal of establishing some "big items" we could put on the roadmap and
> maybe even publish...
>
> Here are some of the thing's I've been thinking about (unordered)
>
> * NUTCH-2940 Develop Gradle Core Build for Apache Nutch
> * Metrics system integration cf.
https://github.com/apache/nutch/pull/712
<https://github.com/apache/nutch/pull/712>
> * Upgrading Javac version > 11
> * Trade study to consider integrating (something like) Plugin
> Framework for Java (PF4J) into Nutch
> * porting Nutch to run on Apache Beam https://beam.apache.org/
<https://beam.apache.org/>
>
> Does anyone else have candidates they wish to add?
>
> Thanks for your consideration.
>
> lewismc
>
>