> AMD ryzen looks fantastic.  Others would be great as well.
> 
> If ubuntu is possible at all, that's what I've been working with most
> recently.

OK - will setup with that distro

> 
> Other than that, ssh access and sudo privileges would be all I'd need.
> 
> Are you ok if we set up apache httpd to host files for the public or will
> this be a community only resource?

it can be used for whatever we want it to - so if you consider public file 
sharing useful of course we can do that. Would be
good if we get a proper domain for https access. Would that be something infra 
can do?

> 
> If this is corporate sponsored, please let me know how/if we should mention
> the sponsorship.

no need to mention it - happy to help. 

> 
> Again...wow.  Thank you!
> 
> Best,
> 
>       Tim
> 
> On Tue, Jun 2, 2020 at 9:22 AM Maruan Sahyoun <sahy...@fileaffairs.de>
> wrote:
> 
> > Could fund either:
> > 
> > AMD Ryzen 5 3600
> > 64 GB RAM
> > 2x2TB
> > 
> > or
> > 
> > AMD Ryzen 7 3700X based Server
> > 64 GB RAM
> > 2x8TB
> > 
> > or
> > Intel® Core™ i9-9900K
> > 64 GB RAM
> > 2x8TB
> > 
> > All are root servers so one has to vote for taking care of them (I can do
> > the initial setup).
> > 
> > 
> > 
> > BR
> > Maruan
> > 
> > > There are two use cases.
> > > 
> > > 1) host shared data so that we can all point to and work from the same
> > > data, ideally both literal docs and also extracts (text/metadata .json
> > > files representing extracted information).
> > > 
> > > 2) a modest vm to allow all of us to run the regression tests
> > > 
> > > We could use help with either or both.
> > > 
> > > What we had before:
> > > 8 GB RAM
> > > 8 cores
> > > 4 TB -- 2TB for docs, 1TB for extracts, 1TB for staging
> > > 
> > > We can always use more RAM and more cores up to the point of I/O
> > > bottlenecks.
> > > 
> > > On Tue, Jun 2, 2020 at 6:37 AM Maruan Sahyoun <sahy...@fileaffairs.de>
> > > wrote:
> > > 
> > > > is that a storage box only or does it need to do some computings too?
> > > > 
> > > > Maybe you could write a small spec for the server requirement?
> > > > 
> > > > BR
> > > > Maruan
> > > > 
> > > > 
> > > > > Still haven’t had time to put the server in a dmz. Ugh.
> > > > > 
> > > > >  Yes, more than happy to share.
> > > > > 
> > > > > If anyone has recommendations for file hosting for a couple of TB,
> > let me
> > > > > know.
> > > > > 
> > > > > One option would be to work with CommonCrawl to bump the max file
> > size
> > > > one
> > > > > crawl a year...
> > > > > 
> > > > > On Tue, Jun 2, 2020 at 1:48 AM Tilman Hausherr <
> > thaush...@t-online.de>
> > > > > wrote:
> > > > > 
> > > > > > Can we / I access these files? Most differences are improvements
> > or not
> > > > > > meaningful, but there are a few I'd like to have a look, e.g.
> > > > > > 
> > > > > > commoncrawl3/commoncrawl3/XO/XOAAGISRMRPZQRZF4LSMJERGEYK5QI2T
> > > > > > 
> > > > > > the word "antrag" loses the first "a". Although maybe the "a" was
> > a big
> > > > > > one and gets assigned to another line.
> > > > > > 
> > > > > > Tilman
> > > > > > 
> > > > > > Am 02.06.2020 um 02:58 schrieb Tim Allison:
> > > > > > > > > Reports are available here:
> > https://github.com/tballison/share/blob/master/tika_comparisons/reports-pdfbox-2.0.20.tgz
> > > > > > > Looks like there are trivial differences in content with a slight
> > > > > > > improvement over 2.0.19.  I don't see any differences in
> > exceptions
> > > > or
> > > > > > > attachments.
> > > > > > > 
> > > > > > > Cheers,
> > > > > > > 
> > > > > > >          Tim
> > > > > > > 
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > > > 
> > > > > > 
> > --
> > Maruan Sahyoun
> > 
> > FileAffairs GmbH
> > Josef-Schappe-Straße 21
> > 40882 Ratingen
> > 
> > Tel: +49 (2102) 89497 88
> > Fax: +49 (2102) 89497 91
> > sahy...@fileaffairs.de
> > www.fileaffairs.de
> > 
> > Geschäftsführer: Maruan Sahyoun
> > Handelsregister: AG Düsseldorf, HRB 53837
> > UST.-ID: DE248275827
> > 
> > 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to