Re: [Gluster-users] Newbie: Exploring Gluster for large-scale deployment in AWS, large media files, high performance I/O

Forrest Aldrich Tue, 14 Jul 2015 13:22:30 -0700

The instances we use via Direct Connect (a third party company) haveupwards 20 disks and a total of 80T. That part is covered.

If we were to experiment with EBS, that would be a different case aswe'd need to stripe them.

Our present model requires one single namespace via NFS. The Instancesare running CentOS 6.x. The Instances mount the Direct Connect diskspace via NFS, the only other alternative we'd have is iSCSI whichwouldn't work for the level of sharing we need.





On 7/14/15 4:18 PM, Mathieu Chateau wrote:

by NFS i think you just mean "all servers seeing and changing samesfiles" ? That can be done with fuse, without nfs.NFS is harder for failover while automatic with fuse (no need fordynamic dns or virtual IP).


for redundancy I mean : What failure do you want to survive ?

  * Loosing a disk
  * Filesystem corrupt
  * Server lost or in maintenance
  * Whole region down

Depending on your needs, then you may have to replicate data accrossgluster brick or even use a geo dispersed brick.

Will network between your servers and node be able to handle thattraffic (380MB/s = 3040Mb/s) ?

I guess gluster can handle that load, you are using big files and thisis where gluster deliver highest output. Nevertheless, you will needmany disk to provide these i/o, even more if using replicated bricks.



Cordialement,
Mathieu CHATEAU
http://www.lotp.fr

2015-07-14 21:15 GMT+02:00 Forrest Aldrich <[email protected]<mailto:[email protected]>>:


    Sorry, I should have noted that.Â  380MB is both read and write (I
    confirmed this with a developer).

    We do need the NFS stack, as that's how all the code and various
    many Instances work -- we have several "workers" that chop up
    video on the same namespace.Â  It's not efficient, but that's how
    it has to be for now.

    Redundancy, in terms of the server?Â Â  We have RAIDED volumes if
    that's what you're referring to.

    Here's a basic outline of the flow (as I understand it):


    Video Capture Agent sends in large file of video (30gb +/-)

    Administrative host receives and writes to NFS

    A process copies this over to another point in the namespace

    Another Instance picks up the file, reads and starts processing
    and writes (FFMPEG is involved)


    Something like that -- I may not have all the steps, but
    essentially there's a ton of I/O going on.Â Â  I know our code
    model is not efficient, but it's complicated and can't just be
    changed (it's based on an open source product and there's some
    code baggage).

    We looked into another product that allegedly scaled out using
    multiple NFS heads with massive local cache (AWS instances) and
    sharing the same space, but it was horrible and just didn't work
    for us.



    Thank you.




    On 7/14/15 3:06 PM, Mathieu Chateau wrote:

    Hello,

    is it 380MB in read or write ? What level of redundancy do you need?
    do you really need nfs stack or just a mount point (and so be
    able to use native gluster protocol) ?

    Gluster load is mostly put on clients, not server (clients do the
    sync writes to all replica, and do the memory cache)


    Cordialement,
    Mathieu CHATEAU
    http://www.lotp.fr

    2015-07-14 20:49 GMT+02:00 Forrest Aldrich <[email protected]
    <mailto:[email protected]>>:

        I'm exploring solutions to help us achieve high throughput
        and scalability within the AWS environment.Â Â Specifically,
        I work in a department where we handle and produce video
        content that results in very large files (30GB etc) that must
        be written to NFS, chopped up and copied over on the same
        mount (there are some odd limits to the code we use, but
        that's outside the scope of this question).

        Currently, we're using a commercial vendor with AWS, with
        dedicated Direct Connect instances as the back end to our
        production.Â Â We're maxing out at 350 to 380 MB/s which is
        not enough.Â  We expect our capacity will double or even
        triple when we bring on more classes or even other entities
        and we need to find a way to squeeze out as much I/O as we can.

        Our software model depends on NFS, there's no way around that
        presently.

        Since GlusterFS uses FUSE, I'm concerned about performance,
        which is a key issue.Â  Â Sounds like a STRIPE would be
        appropriate.

        My basic understanding of Gluster is the ability to include
        several "bricks" which could be multiples of either dedicated
        EBS volumes or even multiple instances of the above
        commercial vendor, served up via NFS namespace, which would
        be transparently a single namespace to client connections.Â
        Â The I/O could be distributed in this manner.

        I wonder if someone here with more experience with the above
        might elaborate on whether GlusterFS could be used in the
        above scenario. Specifically, performance I/O.Â  We'd really
        like to gain upwards as much as possible, like 700 Mb/s and 1
        GB/s and up if possible.



        Thanks in advance.





        _______________________________________________
        Gluster-users mailing list
        [email protected] <mailto:[email protected]>
        http://www.gluster.org/mailman/listinfo/gluster-users



    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Newbie: Exploring Gluster for large-scale deployment in AWS, large media files, high performance I/O

Reply via email to