I'd be more than happy to help with maintenance. This would be AMAZING! On Tue, Jun 2, 2020 at 9:22 AM Maruan Sahyoun <sahy...@fileaffairs.de> wrote:
> Could fund either: > > AMD Ryzen 5 3600 > 64 GB RAM > 2x2TB > > or > > AMD Ryzen 7 3700X based Server > 64 GB RAM > 2x8TB > > or > Intel® Core™ i9-9900K > 64 GB RAM > 2x8TB > > All are root servers so one has to vote for taking care of them (I can do > the initial setup). > > > > BR > Maruan > > > There are two use cases. > > > > 1) host shared data so that we can all point to and work from the same > > data, ideally both literal docs and also extracts (text/metadata .json > > files representing extracted information). > > > > 2) a modest vm to allow all of us to run the regression tests > > > > We could use help with either or both. > > > > What we had before: > > 8 GB RAM > > 8 cores > > 4 TB -- 2TB for docs, 1TB for extracts, 1TB for staging > > > > We can always use more RAM and more cores up to the point of I/O > > bottlenecks. > > > > On Tue, Jun 2, 2020 at 6:37 AM Maruan Sahyoun <sahy...@fileaffairs.de> > > wrote: > > > > > is that a storage box only or does it need to do some computings too? > > > > > > Maybe you could write a small spec for the server requirement? > > > > > > BR > > > Maruan > > > > > > > > > > Still haven’t had time to put the server in a dmz. Ugh. > > > > > > > > Yes, more than happy to share. > > > > > > > > If anyone has recommendations for file hosting for a couple of TB, > let me > > > > know. > > > > > > > > One option would be to work with CommonCrawl to bump the max file > size > > > one > > > > crawl a year... > > > > > > > > On Tue, Jun 2, 2020 at 1:48 AM Tilman Hausherr < > thaush...@t-online.de> > > > > wrote: > > > > > > > > > Can we / I access these files? Most differences are improvements > or not > > > > > meaningful, but there are a few I'd like to have a look, e.g. > > > > > > > > > > commoncrawl3/commoncrawl3/XO/XOAAGISRMRPZQRZF4LSMJERGEYK5QI2T > > > > > > > > > > the word "antrag" loses the first "a". Although maybe the "a" was > a big > > > > > one and gets assigned to another line. > > > > > > > > > > Tilman > > > > > > > > > > Am 02.06.2020 um 02:58 schrieb Tim Allison: > > > > > > > > Reports are available here: > > > > https://github.com/tballison/share/blob/master/tika_comparisons/reports-pdfbox-2.0.20.tgz > > > > > > Looks like there are trivial differences in content with a slight > > > > > > improvement over 2.0.19. I don't see any differences in > exceptions > > > or > > > > > > attachments. > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Tim > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org > > > > > > > > > > > > > > > > > -- > Maruan Sahyoun > > FileAffairs GmbH > Josef-Schappe-Straße 21 > 40882 Ratingen > > Tel: +49 (2102) 89497 88 > Fax: +49 (2102) 89497 91 > sahy...@fileaffairs.de > www.fileaffairs.de > > Geschäftsführer: Maruan Sahyoun > Handelsregister: AG Düsseldorf, HRB 53837 > UST.-ID: DE248275827 > >