Hi,

Most common filesystems support many millions of files 
(https://en.wikipedia.org/wiki/Ext4). You can do folder sharding as you 
mention by subclassing the built in filesystem storage backend and store a 
document's file in a folder based on the first character for example. To 
avoid any possible problem in the future thought I would recommend starting 
with an object storage (like S3 for example) from the beginning. The other 
advantage to this is that you abstract storage further, the Mayan 
installation has no knowledge how files are actually stored (filesystem, 
etx4, XFS, RAID, local, remote, etc). 

The other recommendation would be to use RabbitMQ for the broker in a 
cluster setup with a few nodes. Same for REDIS for the results backend.

Lastly spread the queues over several workers to avoid tasks in a queue 
pilling and blocking other tasks.

This setup will be costly in terms of memory usage, but the exchange here 
is memory in favor of scalability.

I look forward to any details you can share from this setup, it will be a 
good case study for further improvements.

On Friday, May 19, 2017 at 4:33:54 AM UTC-4, Gerrit Van Dyk wrote:
>
> Hi
>
> Our implementation will have at least 5 000 documents added to it per day. 
> This will grow the repository by almost 2 million documents per annum.
>
> As Mayan EDMS are storing all documents physically in one folder, should 
> we be concerned about this, or how should we split the uploaded files over 
> a directory structure. 
>
> Is there any precautions that we should be aware of, before we setup such 
> a large repository?
>
> Gerrit
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to