Am 10.08.20 um 19:27 schrieb Tim Allison:
I've updated the process here:

https://cwiki.apache.org/confluence/display/TIKA/TikaEvalOnVM

One of the key missing pieces was the batch-scripts.tgz file.  Apparently,
that attached file never made it during the confluence migration.  I was
able to reconstruct it from my email's sent box. :(  Tilman noted the
missing attachment a long, long time ago, and I finally got around to
fixing it.  Sorry it took me so long.
Sounds good!

I've kicked off the process with the most recent version of the PDFBox 2.x
branch and with a bug fix for the problem uncovered in Tika in the last run.
Thanks Tim!

For anyone with access to the VM who wants to give this process a try,
please do try it out after the current run finishes.  If you only want to
test a few hundred files, just shorten the fileList...see instructions. :D
I'll have a look once your run is finished ...

Again, I'm sorry for not taking care of this before I went on leave.  Let
me know how I can improve the documentation or anything else.
Will do, thanks a lot so far.


Cheers,

            Tim

On Mon, Aug 10, 2020 at 9:47 AM Tim Allison <talli...@apache.org> wrote:

Working on this now.  Will post update when documentation is ready.

On Wed, Aug 5, 2020 at 3:30 PM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:

Am 05.08.20 um 17:19 schrieb Tim Allison:
Y, that's pretty close.

Unfortunately, I'm away from my dev environment and can't access the vm
to
confirm.  I don't think I got the list of pdf files into a location
where
you can see it. WIth enough permissions, you should be able to see it in
/data/work (???)...argh.
I've read access to the whole corpus of files. I already compiled two
tika
versions to be used for the comparison. Unfortunately I wasn't able to
run it as
described at [1]. An exception occurred after some time and I gave up.

I'm sorry for not getting things in order before I left.  I'll be back
on
Monday. :(
No need to worry, we didn't agree on any deadline for anything, so
everything is
fine.

It would be cool if you rerun the tests (2.0.20 vs 2.0.21-SNAPSHOT) and
maybe we
can use your setup as template or so.

Thanks in advance

Andreas

[1] https://cwiki.apache.org/confluence/display/TIKA/TikaEval





On Sun, Aug 2, 2020 at 1:20 PM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:

Am 02.08.20 um 15:26 schrieb Maruan Sahyoun:

Hi Andreas,

I'll add you as a user. Details as pm.
Access works, thanks Maruan!

@Tim Is [1] still a valid documentation for the regression test run?

[1] https://cwiki.apache.org/confluence/display/tika/TikaEvalOnVM


BR
Maruan


Hi,

I'd like to get access to the corpora server to run the regression
tests for
PDFBox on my own, so that we don't have to bother Tim every time we
want to cut
a new release. Furthermore I'd like to run some 2.0.x vs trunk tests
in
the
future and it'd handy to do that my self.

What do I have to do to get access?

Is there any documentation on how to configure the regressions test
runner, or
is it possible to simply copy and modify an existing installation?


Cheers
Andreas







Reply via email to