"Poison BL." <poiso...@gmail.com> writes:

> On Sat, Apr 29, 2017 at 9:11 PM, lee <l...@yagibdah.de> wrote:
>>
>> "Poison BL." <poiso...@gmail.com> writes:
>> > Half petabyte datasets aren't really something I'd personally *ever*
> trust
>> > ftp with in the first place.
>>
>> Why not?  (12GB are nowhere close to half a petabyte ...)
>
> Ah... I completely misread that "or over 50k files in 12GB" as 50k files
> *at* 12GB each... which works out to 0.6 PB, incidentally.
>
>> The data would come in from suppliers.  There isn't really anything
>> going on atm but fetching data once a month which can be like 100MB or
>> 12GB or more.  That's because ppl don't use ftp ...
>
> Really, if you're pulling it in from third party suppliers, you tend to be
> tied to what they offer as a method of pulling it from them (or them
> pushing it out to you), unless you're in the unique position to dictate the
> decision for them.

They need to use ftp to deliver the data, we need to use ftp to get the
data.  I don't want that any other way.

The problem is that the ones supposed to deliver data are incompetent
and don't want to use ftp because it's too complicated.  So what's the
better solution?


> [...]
>
>> > How often does it need moved in/out of your facility, and is there no
> way
>> > to break up the processing into smaller chunks than a 0.6PB mass of
> files?
>> > Distribute out the smaller pieces with rsync, scp, or the like, operate
> on
>> > them, and pull back in the results, rather than trying to shift around
> the
>> > entire set. There's a reason Amazon will send a physical truck to a
> site to
>> > import large datasets into glacier... ;)
>>
>> Amazon has trucks?  Perhaps they do in other countries.  Here, amazon is
>> just another web shop.  They might have some delivery vans, but I've
>> never seen one, so I doubt it.  And why would anyone give them their
>> data?  There's no telling what they would do with it.
>
> Amazon's also one of the best known cloud computing suppliers on the planet
> (AWS = Amazon Web Services). They have everything from pure compute
> offerings to cloud storage geared towards *large* data archival. The latter
> offering is named "glacier", and they offer a service for the import of
> data into it (usually the "first pass", incremental changes are generally
> done over the wire) that consists of a shipping truck with a rather nifty
> storage system in the back of it that they hook right into your network.
> You fill it with data, and then they drive it back to one of their data
> centers to load it into place.

They might not have that here.  And who would want to give their data
out of hands?


-- 
"Didn't work" is an error.

Reply via email to