Still haven’t had time to put the server in a dmz. Ugh.

 Yes, more than happy to share.

If anyone has recommendations for file hosting for a couple of TB, let me
know.

One option would be to work with CommonCrawl to bump the max file size one
crawl a year...

On Tue, Jun 2, 2020 at 1:48 AM Tilman Hausherr <[email protected]>
wrote:

> Can we / I access these files? Most differences are improvements or not
> meaningful, but there are a few I'd like to have a look, e.g.
>
> commoncrawl3/commoncrawl3/XO/XOAAGISRMRPZQRZF4LSMJERGEYK5QI2T
>
> the word "antrag" loses the first "a". Although maybe the "a" was a big
> one and gets assigned to another line.
>
> Tilman
>
> Am 02.06.2020 um 02:58 schrieb Tim Allison:
> >>
> >>> Reports are available here:
> >
> https://github.com/tballison/share/blob/master/tika_comparisons/reports-pdfbox-2.0.20.tgz
> >
> > Looks like there are trivial differences in content with a slight
> > improvement over 2.0.19.  I don't see any differences in exceptions or
> > attachments.
> >
> > Cheers,
> >
> >          Tim
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to