All,

  Tilman Hausherr mentioned that we might want to update the
common-crawl pdfs in our regression corpus.  This proposal leaves the
bugtracker PDFs as they are.

For the CC-based PDFs, we could:

1) remove existing truncated pdfs

2) fold in newer untruncated PDFs from:
https://digitalcorpora.org/corpora/file-corpora/cc-main-2021-31-pdf-untruncated/

What do you think?

Best,

      Tim

Reply via email to