Re: [GENERAL] Storing files: 2.3TBytes, 17M file count

Jerome Wagner Tue, 29 Nov 2016 02:09:55 -0800

2 other options that you may want to look at :

- cephfs


This has nothing to do with postgres but is a distributed filesystem
handling very large amount of files (thinks next generation NFS)
I haven't tried it myself yet but they reached a "stable" milestone
regarding the distributed fs.
cf https://en.wikipedia.org/wiki/Ceph_(software)

- fuse based postgres storage

this can
 - relieve the "pg_largeobject" beeing a system table issue (tablespace, ..)
 - give you a fs-like access to your data

The closest I have seen to this thus far is
https://github.com/andreasbaumann/pgfuse which would probably need some
tinkering.





On Tue, Nov 29, 2016 at 10:50 AM, Thomas Güttler <
guettl...@thomas-guettler.de> wrote:

>
>
> Am 29.11.2016 um 01:52 schrieb Mike Sofen:
>
>> From: Thomas Güttler   Sent: Monday, November 28, 2016 6:28 AM
>>
>> ...I have 2.3TBytes of files. File count is 17M
>>
>> Since we already store our structured data in postgres, I think about
>> storing the files in PostgreSQL, too.
>>
>> Is it feasible to store file in PostgreSQL?
>>
>> -------
>>
>> I am doing something similar, but in reverse.  The legacy mysql databases
>> I’m converting into a modern Postgres data
>> model, have very large genomic strings stored in 3 separate columns.  Out
>> of the 25 TB of legacy data storage (in 800
>> dbs across 4 servers, about 22b rows), those 3 columns consume 90% of the
>> total space, and they are just used for
>> reference, never used in searches or calculations.  They range from 1k to
>> several MB.
>>
>>
>>
>> Since I am collapsing all 800 dbs into a single PG db, being very smart
>> about storage was critical.  Since we’re also
>> migrating everything to AWS, we’re placing those 3 strings (per row) into
>> a single json document and storing the
>> document in S3 bins, with the pointer to the file being the globally
>> unique PK for the row…super simple.  The app tier
>> knows to fetch the data from the db and large string json from the S3
>> bins.  The retrieval time is surprisingly fast,
>> this is all real time web app stuff.
>>
>>
>>
>> This is a model that could work for anyone dealing with large objects
>> (text or binary).  The nice part is, the original
>> 25TB of data storage drops to 5TB – a much more manageable number,
>> allowing for significant growth, which is on the horizon.
>>
>
> Thank you Mike for your feedback.
>
> Yes, I think I will drop my idea. Encoding binary (the file content) to
> text and decoding to binary again makes no sense. I was not aware that this
> is needed.
>
> I guess I will use some key-to-blob store like s3. AFAIK there are open
> source s3 implementations available.
>
> Thank you all for your feeback!
>
>  Regards, Thomas
>
>
>
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Storing files: 2.3TBytes, 17M file count

Reply via email to