Hi Stefan,

  I'm sorry for this sudden change. I'm hoping that we can find a way to
make this all work again, but there are complexities. Part of the challenge
is that the liability is spread across several organizations and
individuals; part of the challenge is everything to do with the varying
global legal/privacy requirements around crawled data. And there are other
challenges.

  These corpora have been critical to numerous parsing projects at the ASF
and to devs and projects outside of ASF.   I've heard from a few others
offline who are also affected by this.


All,
  What are our priorities? How can we move forward? Some options that I see:

0) nuclear option: shutdown the server entirely
1) continue as we have it now -- no http/s access
2) host reports/metadata only via https
3) host "packaged" corpora in zips (password protected?) via https
4) password protect https access to the corpora
5) not a viable option: turn everything back on
6) not a viable option: turn everything back on with a strict robots.txt
policy

  Any other options? What are our preferences?

          Best,

                Tim

On Sat, Jan 11, 2025 at 9:01 AM stefan6419846 <stefan6419...@gmail.com>
wrote:

> We at pypdf (https://github.com/py-pdf/pypdf) have been hit by the
> unexpected shutdown of the service and were glad to at least find this
> indirect announcement. Nevertheless, it seems like we have to find a
> suitable alternative for the previously used govdocs1 PDF files from
> your server, as the official govdocs1 sources do not expose the single
> PDF files directly.
>
> Thanks for hosting these files in the past.
>
> Best regards,
> Stefan
>
> On 2025/01/09 01:36:59 Tim Allison wrote:
> > \All,
> >  We've gotten a handful of takedown requests recently. I had initially
> > envisioned public sharing of files as a key component of our server. We
> can
> > still use the files and offer read access to fellow file researchers. I'm
> > not sure I want to deal with further takedown requests.
> >  As an intermediate step, we could ask robots not to crawl the data, but
> > that's not reliable.
> >  So, in lieu of that, with heavy heart, I ask if it is time to close off
> > public access?
> >   WDYT?
> >
> >           Best,
> >
> >                     Tim
> >
>

Reply via email to