I've to correct myself: lucene.apache.org seems to be a bad example: https://lucene.apache.org/core/ works; other projects work as well, e.g. https://tika.apache.org/
Thus it seems pdfbox really has a problem.

Best regards,
Timo


Am 10.07.24 um 09:54 schrieb Constantine Dokolas:
Yikes!

C.D.

On Wed, Jul 10, 2024 at 10:52 AM Timo Boehme
<t.boe...@digital-science.com.invalid> wrote:

No, its the whole X.apache.org which is currently not available (e.g.
lucene.apache.org).

Best regards,
Timo


Am 10.07.24 um 09:46 schrieb Constantine Dokolas:
It seems like you managed to take down pdfbox.apache.org... 😱

C.D.

--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman


On Wed, Jul 10, 2024 at 9:00 AM Andreas Lehmkühler
<andr...@lehmi.de.invalid>
wrote:

There is some issue with tagging the release when executing the
release:prepare goal

I'm still searching  :-(

Andreas

Am 09.07.24 um 07:46 schrieb Andreas Lehmkühler:
It is still tricky but I'm on it.

Sorry for the noise

Andreas

Am 08.07.24 um 20:06 schrieb Andreas Lehmkühler:
There is an issue with the changes from
https://issues.apache.org/jira/browse/PDFBOX-5789


I've to postpone the release to solve the issue first

Sorry for the inconvenience

Andreas

Am 08.07.24 um 19:02 schrieb Andreas Lehmkühler:
Looks good to me, I'm starting the release process ...

Am 08.07.24 um 08:43 schrieb Tilman Hausherr:
Last one:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_4.tar.xz
This is because the last change I made yesterday.

Tilman

On 06.07.2024 19:17, Tilman Hausherr wrote:
Result:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_3.tar.xz
to be compared against


https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
I couldn't find a difference visually except the file sizes. This
might be because of the path names or some meta data.

Tilman

On 06.07.2024 14:19, Tilman Hausherr wrote:
Hi,

I've just started a new "B" test.

Tilman

On 06.07.2024 13:29, Andreas Lehmkühler wrote:
Hi,

after closing https://issues.apache.org/jira/browse/PDFBOX-5838
I'd like to finally cut the 2.0.32 release.

Do we need a new regression test due the latest changes?

There some related changes such as
https://issues.apache.org/jira/browse/PDFBOX-5843 and the recent
refactoring in fontbox.

Andreas


Am 14.06.24 um 13:03 schrieb Tilman Hausherr:
Result:

https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32_2.tar.xz
   From what I see, nothing to do.
And I know the time it takes: 3 hours for the A (or B) test, 1
hour to create the A vs B report (tika-eval).

Tilman

On 14.06.2024 08:47, Tilman Hausherr wrote:
I'll repeat the regression tests with locally reverting the
change from PDFBOX-5790 but locally adding my proposed xmpbox
change from PDFBOX-5835. This way we'll know whether there are
other problems.

Tilman

On 13.06.2024 19:23, Tilman Hausherr wrote:
See https://issues.apache.org/jira/browse/PDFBOX-5838

I hope that it's all the same problem.

Tilman

On 13.06.2024 18:30, Andreas Lehmkühler wrote:
Thanks for running the tests.

the exceptions part looks good, but I'm afraid we have a text
extraction issue.

commoncrawl3_refetched/JA/JA77WEHMKS2T5LCXM42OXFJ3OSBNRDTI

some of the special characters changed. In 2.0.31 the were
"omitted" and in 2.0.32 there is some special char. But th
remaining part looks good to me.


cc-main-2021-31-pdf-untruncated/0085/0085885.pdf

ist seems to contain some special characters as well, but
2.0.31 is able to extract them. 2.0.32 seems to mix some of
the content.

I guess it is somehow font related. Need to investigate more

Andreas


Am 12.06.24 um 20:23 schrieb Tilman Hausherr:
https://home.snafu.de/tilman/tmp/reports_pdfbox_2.0.31_vs_2.0.32.tar.xz
No new exceptions but many content differences. I haven't
investigated yet.

Tilman

On 12.06.2024 11:31, Tilman Hausherr wrote:
I've started the tests. If there aren't any troubles I'll
have the results tomorrow.

Tilman

On 05.06.2024 08:07, Andreas Lehmkühler wrote:
Thanks for the update.

I'm going to postpone the release as I'll need any helping
hand I can get.

Andreas

Am 02.06.24 um 14:22 schrieb Tilman Hausherr:
+1 but I won't be able to help with tests this time

Tilman

On 01.06.2024 12:15, Andreas Lehmkühler wrote:
Hi,

IMHO it is time to cut another 2.0.x release.

I'm planing to do so in a week or so?

Any objections or is there something we should add/fix
first?

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: t.boe...@digital-science.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal      | USt-IdNr.: DE246232735
managing directors: Dr. Felix Berthelmann - Mario Diwersy


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: t.boe...@digital-science.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal      | USt-IdNr.: DE246232735
managing directors: Dr. Felix Berthelmann - Mario Diwersy


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to