>proper domain for https access

I just pinged infra on slack.

If they're able to do it, what would we want?

file-corpora.apache.org
corpora.apache.org
corpora-pdfbox.apache.org
corpora-tika.apache.org

Something else?  I'm also happy to buy a domain if that won't work.  There
are a couple available that are close enough.

On Tue, Jun 2, 2020 at 1:08 PM Maruan Sahyoun <[email protected]>
wrote:

>
> > AMD ryzen looks fantastic.  Others would be great as well.
> >
> > If ubuntu is possible at all, that's what I've been working with most
> > recently.
>
> OK - will setup with that distro
>
> >
> > Other than that, ssh access and sudo privileges would be all I'd need.
> >
> > Are you ok if we set up apache httpd to host files for the public or will
> > this be a community only resource?
>
> it can be used for whatever we want it to - so if you consider public file
> sharing useful of course we can do that. Would be
> good if we get a proper domain for https access. Would that be something
> infra can do?
>
> >
> > If this is corporate sponsored, please let me know how/if we should
> mention
> > the sponsorship.
>
> no need to mention it - happy to help.
>
> >
> > Again...wow.  Thank you!
> >
> > Best,
> >
> >       Tim
> >
> > On Tue, Jun 2, 2020 at 9:22 AM Maruan Sahyoun <[email protected]>
> > wrote:
> >
> > > Could fund either:
> > >
> > > AMD Ryzen 5 3600
> > > 64 GB RAM
> > > 2x2TB
> > >
> > > or
> > >
> > > AMD Ryzen 7 3700X based Server
> > > 64 GB RAM
> > > 2x8TB
> > >
> > > or
> > > Intel® Core™ i9-9900K
> > > 64 GB RAM
> > > 2x8TB
> > >
> > > All are root servers so one has to vote for taking care of them (I can
> do
> > > the initial setup).
> > >
> > >
> > >
> > > BR
> > > Maruan
> > >
> > > > There are two use cases.
> > > >
> > > > 1) host shared data so that we can all point to and work from the
> same
> > > > data, ideally both literal docs and also extracts (text/metadata
> .json
> > > > files representing extracted information).
> > > >
> > > > 2) a modest vm to allow all of us to run the regression tests
> > > >
> > > > We could use help with either or both.
> > > >
> > > > What we had before:
> > > > 8 GB RAM
> > > > 8 cores
> > > > 4 TB -- 2TB for docs, 1TB for extracts, 1TB for staging
> > > >
> > > > We can always use more RAM and more cores up to the point of I/O
> > > > bottlenecks.
> > > >
> > > > On Tue, Jun 2, 2020 at 6:37 AM Maruan Sahyoun <
> [email protected]>
> > > > wrote:
> > > >
> > > > > is that a storage box only or does it need to do some computings
> too?
> > > > >
> > > > > Maybe you could write a small spec for the server requirement?
> > > > >
> > > > > BR
> > > > > Maruan
> > > > >
> > > > >
> > > > > > Still haven’t had time to put the server in a dmz. Ugh.
> > > > > >
> > > > > >  Yes, more than happy to share.
> > > > > >
> > > > > > If anyone has recommendations for file hosting for a couple of
> TB,
> > > let me
> > > > > > know.
> > > > > >
> > > > > > One option would be to work with CommonCrawl to bump the max file
> > > size
> > > > > one
> > > > > > crawl a year...
> > > > > >
> > > > > > On Tue, Jun 2, 2020 at 1:48 AM Tilman Hausherr <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Can we / I access these files? Most differences are
> improvements
> > > or not
> > > > > > > meaningful, but there are a few I'd like to have a look, e.g.
> > > > > > >
> > > > > > > commoncrawl3/commoncrawl3/XO/XOAAGISRMRPZQRZF4LSMJERGEYK5QI2T
> > > > > > >
> > > > > > > the word "antrag" loses the first "a". Although maybe the "a"
> was
> > > a big
> > > > > > > one and gets assigned to another line.
> > > > > > >
> > > > > > > Tilman
> > > > > > >
> > > > > > > Am 02.06.2020 um 02:58 schrieb Tim Allison:
> > > > > > > > > > Reports are available here:
> > >
> https://github.com/tballison/share/blob/master/tika_comparisons/reports-pdfbox-2.0.20.tgz
> > > > > > > > Looks like there are trivial differences in content with a
> slight
> > > > > > > > improvement over 2.0.19.  I don't see any differences in
> > > exceptions
> > > > > or
> > > > > > > > attachments.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > >          Tim
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > > For additional commands, e-mail: [email protected]
> > > > > > >
> > > > > > >
> > > --
> > > Maruan Sahyoun
> > >
> > > FileAffairs GmbH
> > > Josef-Schappe-Straße 21
> > > 40882 Ratingen
> > >
> > > Tel: +49 (2102) 89497 88
> > > Fax: +49 (2102) 89497 91
> > > [email protected]
> > > www.fileaffairs.de
> > >
> > > Geschäftsführer: Maruan Sahyoun
> > > Handelsregister: AG Düsseldorf, HRB 53837
> > > UST.-ID: DE248275827
> > >
> > >
> --
> Maruan Sahyoun
>
> FileAffairs GmbH
> Josef-Schappe-Straße 21
> 40882 Ratingen
>
> Tel: +49 (2102) 89497 88
> Fax: +49 (2102) 89497 91
> [email protected]
> www.fileaffairs.de
>
> Geschäftsführer: Maruan Sahyoun
> Handelsregister: AG Düsseldorf, HRB 53837
> UST.-ID: DE248275827
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to