Well putting them into 'static' files in an app wont work! Its a bit hidden, but there is a 10,000 file limit https://cloud.google.com/appengine/docs/standard/python/how-requests-are-handled
... Plus you dont really 'upload' apps incrementally, so would need 'somewhere' to first download entire dataset, package it, then upload. Doubt that would be any easy process with one 1TB (Even if can workaround the 10,000 file limit! ) You *could *upload the data to https://cloud.google.com/storage/ - which is roughly comparable with S3 Bucket. But again you will be downloading all the data, uploading to Cloud Storage, then just downloading them *again* for use in the process. (downloading from Cloud Storage, is going to be roughly comparable from S3, maybe bit quicker, but not massively) ... seems wasteful. You going to have to download the data to anyway, so just download from AWS, and use *directly*. It might be painful, but should work. If you find AWS slow, then download iamges in parallel (while individual images might be relatively slow, it can sustain high (even massive) concurrency - ie downloading lots of images at once!) This is an exercise in concurrent processing and throughput. Don't get *distracted *trying to build another storage platform, its unlikely you will do *better *than S3. On Thu, Aug 22, 2019 at 5:18 PM ALT-EMAIL Virilo Tejedor < [email protected]> wrote: > Hi all, > > I'd like to create a static web server to store almost 1 TB of images. > > It is an opensource dataset that I'd like to use to train a Deep Learning > model. > > I have free usage of GPUs and Internet conexion in another plattform, but > they don't provide me 1 TB storage. > > I've also 600$ credits in Google Cloud, I was wondering if there was an > easy way to create something to feed with images the server in the other > plattform. > > The datasource is available as an AWS bucket. I tried to connect the GPU > machine directly to the ASW bucket via awscli, but it is too much slow. > Like if the bucket were thought for a complete sync but not for coninuous > random access to the files. > > I've though two possible approaches: > > - Execute a python script in GAE to download the dataset and to > create a GAE web server: > https://cloud.google.com/appengine/docs/standard/python/getting-started/hosting-a-static-website > > - Execute a python script in GAE to download the dataset and to > create a Google Cloud CDN. > > Do you think any of this approaches are valid to feed the model during the > training? > > I'm a newbie in GAE and any help, starting point or idea will be very > wellcomed. > > Thanks in advance > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/google-appengine/dbd0a8f8-859b-4f50-a108-80b21e27267f%40googlegroups.com > . > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/CAJCAUuJsz4yrgBM6RK1rUcuJz4jJ1rfQYgS7gdD2%2BnO1a22zoA%40mail.gmail.com.
