Well putting them into 'static' files in an app wont work! Its a bit
hidden, but there is a 10,000 file limit
https://cloud.google.com/appengine/docs/standard/python/how-requests-are-handled

... Plus you dont really 'upload' apps incrementally, so would need
'somewhere' to first download entire dataset, package it, then upload.
Doubt that would be any easy process with one 1TB (Even if can workaround
the 10,000 file limit! )


You *could *upload the data to https://cloud.google.com/storage/ - which is
roughly comparable with S3 Bucket. But again you will be downloading all
the data, uploading to Cloud Storage, then just downloading them *again* for
use in the process. (downloading  from Cloud Storage, is going to be
roughly comparable from S3, maybe bit quicker, but not massively)


... seems wasteful. You going to have to download the data to anyway, so
just download from AWS, and use *directly*. It might be painful, but should
work. If you find AWS slow, then download iamges in parallel (while
individual images might be relatively slow, it can sustain high (even
massive) concurrency - ie downloading lots of images at once!)


This is an exercise in concurrent processing and throughput. Don't get
*distracted
*trying to build another storage platform, its unlikely you will do *better
*than S3.







On Thu, Aug 22, 2019 at 5:18 PM ALT-EMAIL Virilo Tejedor <
[email protected]> wrote:

> Hi all,
>
> I'd like to create a static web server to store almost 1 TB of images.
>
> It is an opensource dataset that I'd like to use to train a Deep Learning
> model.
>
> I have free usage of GPUs and Internet conexion in another plattform, but
> they don't provide me 1 TB storage.
>
> I've also 600$ credits in Google Cloud, I was wondering if there was an
> easy way to create something to feed with images the server in the other
> plattform.
>
> The datasource is available as an AWS bucket.  I tried to connect the GPU
> machine directly to the ASW bucket via awscli, but it is too much slow.
> Like if the bucket were thought for a complete sync but not for coninuous
> random access to the files.
>
> I've though two possible approaches:
>
>         - Execute a python script in GAE to download the dataset and to
> create a GAE web server:
> https://cloud.google.com/appengine/docs/standard/python/getting-started/hosting-a-static-website
>
>         - Execute a python script in GAE to download the dataset and to
> create a Google Cloud CDN.
>
> Do you think any of this approaches are valid to feed the model during the
> training?
>
> I'm a newbie in GAE and any help, starting point or idea will be very
> wellcomed.
>
> Thanks in advance
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/dbd0a8f8-859b-4f50-a108-80b21e27267f%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CAJCAUuJsz4yrgBM6RK1rUcuJz4jJ1rfQYgS7gdD2%2BnO1a22zoA%40mail.gmail.com.

Reply via email to