>  
> > AMD ryzen looks fantastic.  Others would be great as well.
> > 
> > If ubuntu is possible at all, that's what I've been working with most
> > recently.
> 
> OK - will setup with that distro
> 
> > Other than that, ssh access and sudo privileges would be all I'd need.
> > 
> > Are you ok if we set up apache httpd to host files for the public or will
> > this be a community only resource?
> 
> it can be used for whatever we want it to - so if you consider public file 
> sharing useful of course we can do that. Would be
> good if we get a proper domain for https access. Would that be something 
> infra can do?
> 
> > If this is corporate sponsored, please let me know how/if we should mention
> > the sponsorship.
> 
> no need to mention it - happy to help. 
> 
> > Again...wow.  Thank you!

Order placed. Once the server is available and the initial setup done I'll post 
here. Should be done by end of week depending on
my other workload.

BR
Maruan


> > 
> > Best,
> > 
> >       Tim
> > 
> > On Tue, Jun 2, 2020 at 9:22 AM Maruan Sahyoun <sahy...@fileaffairs.de>
> > wrote:
> > 
> > > Could fund either:
> > > 
> > > AMD Ryzen 5 3600
> > > 64 GB RAM
> > > 2x2TB
> > > 
> > > or
> > > 
> > > AMD Ryzen 7 3700X based Server
> > > 64 GB RAM
> > > 2x8TB
> > > 
> > > or
> > > Intel® Core™ i9-9900K
> > > 64 GB RAM
> > > 2x8TB
> > > 
> > > All are root servers so one has to vote for taking care of them (I can do
> > > the initial setup).
> > > 
> > > 
> > > 
> > > BR
> > > Maruan
> > > 
> > > > There are two use cases.
> > > > 
> > > > 1) host shared data so that we can all point to and work from the same
> > > > data, ideally both literal docs and also extracts (text/metadata .json
> > > > files representing extracted information).
> > > > 
> > > > 2) a modest vm to allow all of us to run the regression tests
> > > > 
> > > > We could use help with either or both.
> > > > 
> > > > What we had before:
> > > > 8 GB RAM
> > > > 8 cores
> > > > 4 TB -- 2TB for docs, 1TB for extracts, 1TB for staging
> > > > 
> > > > We can always use more RAM and more cores up to the point of I/O
> > > > bottlenecks.
> > > > 
> > > > On Tue, Jun 2, 2020 at 6:37 AM Maruan Sahyoun <sahy...@fileaffairs.de>
> > > > wrote:
> > > > 
> > > > > is that a storage box only or does it need to do some computings too?
> > > > > 
> > > > > Maybe you could write a small spec for the server requirement?
> > > > > 
> > > > > BR
> > > > > Maruan
> > > > > 
> > > > > 
> > > > > > Still haven’t had time to put the server in a dmz. Ugh.
> > > > > > 
> > > > > >  Yes, more than happy to share.
> > > > > > 
> > > > > > If anyone has recommendations for file hosting for a couple of TB,
> > > let me
> > > > > > know.
> > > > > > 
> > > > > > One option would be to work with CommonCrawl to bump the max file
> > > size
> > > > > one
> > > > > > crawl a year...
> > > > > > 
> > > > > > On Tue, Jun 2, 2020 at 1:48 AM Tilman Hausherr <
> > > thaush...@t-online.de>
> > > > > > wrote:
> > > > > > 
> > > > > > > Can we / I access these files? Most differences are improvements
> > > or not
> > > > > > > meaningful, but there are a few I'd like to have a look, e.g.
> > > > > > > 
> > > > > > > commoncrawl3/commoncrawl3/XO/XOAAGISRMRPZQRZF4LSMJERGEYK5QI2T
> > > > > > > 
> > > > > > > the word "antrag" loses the first "a". Although maybe the "a" was
> > > a big
> > > > > > > one and gets assigned to another line.
> > > > > > > 
> > > > > > > Tilman
> > > > > > > 
> > > > > > > Am 02.06.2020 um 02:58 schrieb Tim Allison:
> > > > > > > > > > Reports are available here:
> > > https://github.com/tballison/share/blob/master/tika_comparisons/reports-pdfbox-2.0.20.tgz
> > > > > > > > Looks like there are trivial differences in content with a 
> > > > > > > > slight
> > > > > > > > improvement over 2.0.19.  I don't see any differences in
> > > exceptions
> > > > > or
> > > > > > > > attachments.
> > > > > > > > 
> > > > > > > > Cheers,
> > > > > > > > 
> > > > > > > >          Tim
> > > > > > > > 
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > > > > 
> > > > > > > 
> > > --
> > > Maruan Sahyoun
> > > 
> > > FileAffairs GmbH
> > > Josef-Schappe-Straße 21
> > > 40882 Ratingen
> > > 
> > > Tel: +49 (2102) 89497 88
> > > Fax: +49 (2102) 89497 91
> > > sahy...@fileaffairs.de
> > > www.fileaffairs.de
> > > 
> > > Geschäftsführer: Maruan Sahyoun
> > > Handelsregister: AG Düsseldorf, HRB 53837
> > > UST.-ID: DE248275827
> > > 
> > > 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to