Re: [Hdf-forum] Slow Reading 721GB File in Parallel

Mohamad Chaarawi Wed, 30 May 2012 13:31:57 -0700

Hi Yucong,

On 5/30/2012 3:00 PM, Yucong Ye wrote:

Ok, the total data size is constant, and I am dividing it to 4096parts no matter how many processes I use, so the dataset is fully readonly with 4096 processes. If I am only using 16 processes, the datasetwill only be read 16 parts out of 4096 parts.
Does that clarify what I am doing here?


ok I understand now.. thanks for clarifying this..

But again, since you are reading more data as you scale, you willprobably get slower performance, especially if your selections for allprocesses are non-contiguous in file.The stripe size & count are also major issues you need to address as Imentioned in my previous email.


Mohamad

On May 30, 2012 12:49 PM, "Mohamad Chaarawi" <[email protected]<mailto:[email protected]>> wrote:


    The selection of each process actually stays the same size since
    the region_count is not changing.


    Ok, let me understand this again:
    Your dataset size is constant (no matter what process count you
    execute with), and processes are reading parts of the dataset.
    When you are executing your program with say 16 processes, is your
    dataset being divided equally (to some extent) among the 16 procs?
    When you increase your process count to 36, is the dataset being
    divided equally among 36 processes, meaning that the amount of
    data that a process reads decreases as you scale, since the file
    size is the same?
    If not, then this means you are reading parts of the dataset
    multiple times as you scale, which makes the performance
    degradation expected. This is like comparing the performance, in
    the serial case, of 1 read operation to n read operations.
    If yes, then move on to the second part..


    the result of running "lfs getstripe filename | grep stripe" is:

        lmm_stripe_count:   4
        lmm_stripe_size:    1048576
        lmm_stripe_offset:  286


    The stripe count is way too small for ~1 TB byte.. your system
    administrator should have some guidelines on what the stripe count
    and size should be for certain file sizes. I would check that, and
    readjust those parameters accordingly.

    Thanks,
    Mohamad


    Let me confirm with the second question.

    On Wed, May 30, 2012 at 11:01 AM, Mohamad Chaarawi [via
    hdf-forum] <[hidden email]
    <http://user/SendEmail.jtp?type=node&node=4023160&i=0>> wrote:

        Hi Yucong ,

        On 5/30/2012 12:33 PM, Yucong Ye wrote:


        The region_index changes according to the mpi rank while the
        region_count stays the same, which is 16,16,16.


        Ok, I just needed to make sure that the selections for each
        process are done such that it is compatible with scaling
        being done (as the number of processes increase, the
        selection of each process decreases accordingly).. The
        performance numbers you provided are indeed troubling, but it
        could be for several reasons, some being:

          * The stripe size & count of your file on Lustre could be
            too small. Although this is a read operation (no file
            locking is done by the OSTs), increasing the number of io
            processes puts too much burden on the OSTs. Could you
            check those 2 parameters of your file? you can do that by
            running this on the command line:
              o lfs getstripe filename | grep stripe
          * The MPI-I/O implementation is not doing aggregation. If
            you are using ROMIO, two phase should do this for you
            which sets the default to the number of nodes (not
            processes). I would also try and increase the
            cb_buffer_size (default is 4MBs).

        Thanks,
        Mohamad

        On May 30, 2012 8:19 AM, "Mohamad Chaarawi" <[hidden email]
        <http://user/SendEmail.jtp?type=node&node=4023015&i=0>> wrote:

            Hi Chrisyeshi,

            Is the region_index & region_count the same on all
            processes? i.e. Are you just reading the same data on
            all processes?

            Mohamad

            On 5/29/2012 3:02 PM, chrisyeshi wrote:

                Hi,

                I am having trouble to read from a 721GB file using
                4096 nodes.
                When I test with a few nodes, it works, but when I
                test with more nodes, it
                takes significantly more time.
                What the test program does it only read in the data
                and deleting it.
                Here's the timing information:

                Nodes    |    Time For Running Entire Program
                16              4:28
                32              6:55
                64              8:56
                128            11:22
                256            13:25
                512            15:34

                768            28:34
                800            29:04

                I am running the program in a Cray XK6 system, and
                the file system is Lustre

                *There is a big gap after 512 nodes, and with 4096
                nodes, it couldn't finish
                in 6 hours.
                Is this normal? Shouldn't it be a lot faster?*

                Here is my reading function, it's similar to the
                sample hdf5 parallel
                program:

                #include<hdf5.h>
                #include<stdio.h>
                #include<stdlib.h>
                #include<assert.h>

                void readData(const char* filename, int
                region_index[3], int
                region_count[3], float* flow_field[6])
                {
                  char attributes[6][50];
                  sprintf(attributes[0], "/uvel");
                  sprintf(attributes[1], "/vvel");
                  sprintf(attributes[2], "/wvel");
                  sprintf(attributes[3], "/pressure");
                  sprintf(attributes[4], "/temp");
                  sprintf(attributes[5], "/OH");

                  herr_t status;
                  hid_t file_id;
                  hid_t dset_id;
                  hid_t dset_plist;
                  // open file spaces
                  hid_t acc_tpl = H5Pcreate(H5P_FILE_ACCESS);
                  status = H5Pset_fapl_mpio(acc_tpl, MPI_COMM_WORLD,
                MPI_INFO_NULL);
                  file_id = H5Fopen(filename, H5F_ACC_RDONLY, acc_tpl);
                  status = H5Pclose(acc_tpl);
                  for (int i = 0; i<  6; ++i)
                  {
                    // open dataset
                    dset_id = H5Dopen(file_id, attributes[i],
                H5P_DEFAULT);

                    // get dataset space
                    hid_t spac_id = H5Dget_space(dset_id);
                    hsize_t htotal_size3[3];
                    status = H5Sget_simple_extent_dims(spac_id,
                htotal_size3, NULL);
                    hsize_t region_size3[3] = {htotal_size3[0] /
                region_count[0],
                                               htotal_size3[1] /
                region_count[1],
                                               htotal_size3[2] /
                region_count[2]};

                    // hyperslab
                    hsize_t start[3] = {region_index[0] *
                region_size3[0],
                                        region_index[1] *
                region_size3[1],
                                        region_index[2] *
                region_size3[2]};
                    hsize_t count[3] = {region_size3[0],
                region_size3[1], region_size3[2]};
                    status = H5Sselect_hyperslab(spac_id,
                H5S_SELECT_SET, start, NULL,
                count, NULL);
                    hid_t memspace = H5Screate_simple(3, count, NULL);

                    // read
                    hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER);
                    status = H5Pset_dxpl_mpio(xfer_plist,
                H5FD_MPIO_COLLECTIVE);

                    flow_field[i] = (float *) malloc(count[0] *
                count[1] * count[2] *
                sizeof(float));
                    status = H5Dread(dset_id, H5T_NATIVE_FLOAT,
                memspace, spac_id,
                xfer_plist, flow_field[i]);

                    // clean up
                    H5Dclose(dset_id);
                    H5Sclose(spac_id);
                    H5Pclose(xfer_plist);
                  }
                  H5Fclose(file_id);
                }

                *Do you see any problem with this function? I am new
                to hdf5 parallel.*

                Thanks in advance!

                --
                View this message in context:
                
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429.html
                Sent from the hdf-forum mailing list archive at
                Nabble.com.

                _______________________________________________
                Hdf-forum is for HDF software users discussion.
                [hidden email]
                <http://user/SendEmail.jtp?type=node&node=4023015&i=1>
                http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



            _______________________________________________
            Hdf-forum is for HDF software users discussion.
            [hidden email]
            <http://user/SendEmail.jtp?type=node&node=4023015&i=2>
            http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



        _______________________________________________ Hdf-forum is
        for HDF software users discussion.
        [hidden email]  <http://user/SendEmail.jtp?type=node&node=4023015&i=3>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [hidden email]
        <http://user/SendEmail.jtp?type=node&node=4023015&i=4>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


        ------------------------------------------------------------------------
        If you reply to this email, your message will be added to the
        discussion below:
        
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023015.html

        To unsubscribe from Slow Reading 721GB File in Parallel,
        click here.
        NAML
        
<http://hdf-forum.184993.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




    ------------------------------------------------------------------------
    View this message in context: Re: Slow Reading 721GB File in
    Parallel
    
<http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429p4023160.html>
    Sent from the hdf-forum mailing list archive
    <http://hdf-forum.184993.n3.nabble.com/> at Nabble.com.


    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected]  <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Slow Reading 721GB File in Parallel

Reply via email to