Hi Yucong ,

On 5/30/2012 12:33 PM, Yucong Ye wrote:

The region_index changes according to the mpi rank while the region_count stays the same, which is 16,16,16.


Ok, I just needed to make sure that the selections for each process are done such that it is compatible with scaling being done (as the number of processes increase, the selection of each process decreases accordingly).. The performance numbers you provided are indeed troubling, but it could be for several reasons, some being:

 * The stripe size & count of your file on Lustre could be too small.
   Although this is a read operation (no file locking is done by the
   OSTs), increasing the number of io processes puts too much burden on
   the OSTs. Could you check those 2 parameters of your file? you can
   do that by running this on the command line:
     o lfs getstripe filename | grep stripe
 * The MPI-I/O implementation is not doing aggregation. If you are
   using ROMIO, two phase should do this for you which sets the default
   to the number of nodes (not processes). I would also try and
   increase the cb_buffer_size (default is 4MBs).

Thanks,
Mohamad

On May 30, 2012 8:19 AM, "Mohamad Chaarawi" <[email protected] <mailto:[email protected]>> wrote:

    Hi Chrisyeshi,

    Is the region_index & region_count the same on all processes? i.e.
    Are you just reading the same data on all processes?

    Mohamad

    On 5/29/2012 3:02 PM, chrisyeshi wrote:

        Hi,

        I am having trouble to read from a 721GB file using 4096 nodes.
        When I test with a few nodes, it works, but when I test with
        more nodes, it
        takes significantly more time.
        What the test program does it only read in the data and
        deleting it.
        Here's the timing information:

        Nodes    |    Time For Running Entire Program
        16              4:28
        32              6:55
        64              8:56
        128            11:22
        256            13:25
        512            15:34

        768            28:34
        800            29:04

        I am running the program in a Cray XK6 system, and the file
        system is Lustre

        *There is a big gap after 512 nodes, and with 4096 nodes, it
        couldn't finish
        in 6 hours.
        Is this normal? Shouldn't it be a lot faster?*

        Here is my reading function, it's similar to the sample hdf5
        parallel
        program:

        #include<hdf5.h>
        #include<stdio.h>
        #include<stdlib.h>
        #include<assert.h>

        void readData(const char* filename, int region_index[3], int
        region_count[3], float* flow_field[6])
        {
          char attributes[6][50];
          sprintf(attributes[0], "/uvel");
          sprintf(attributes[1], "/vvel");
          sprintf(attributes[2], "/wvel");
          sprintf(attributes[3], "/pressure");
          sprintf(attributes[4], "/temp");
          sprintf(attributes[5], "/OH");

          herr_t status;
          hid_t file_id;
          hid_t dset_id;
          hid_t dset_plist;
          // open file spaces
          hid_t acc_tpl = H5Pcreate(H5P_FILE_ACCESS);
          status = H5Pset_fapl_mpio(acc_tpl, MPI_COMM_WORLD,
        MPI_INFO_NULL);
          file_id = H5Fopen(filename, H5F_ACC_RDONLY, acc_tpl);
          status = H5Pclose(acc_tpl);
          for (int i = 0; i<  6; ++i)
          {
            // open dataset
            dset_id = H5Dopen(file_id, attributes[i], H5P_DEFAULT);

            // get dataset space
            hid_t spac_id = H5Dget_space(dset_id);
            hsize_t htotal_size3[3];
            status = H5Sget_simple_extent_dims(spac_id, htotal_size3,
        NULL);
            hsize_t region_size3[3] = {htotal_size3[0] / region_count[0],
                                       htotal_size3[1] / region_count[1],
                                       htotal_size3[2] / region_count[2]};

            // hyperslab
            hsize_t start[3] = {region_index[0] * region_size3[0],
                                region_index[1] * region_size3[1],
                                region_index[2] * region_size3[2]};
            hsize_t count[3] = {region_size3[0], region_size3[1],
        region_size3[2]};
            status = H5Sselect_hyperslab(spac_id, H5S_SELECT_SET,
        start, NULL,
        count, NULL);
            hid_t memspace = H5Screate_simple(3, count, NULL);

            // read
            hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER);
            status = H5Pset_dxpl_mpio(xfer_plist, H5FD_MPIO_COLLECTIVE);

            flow_field[i] = (float *) malloc(count[0] * count[1] *
        count[2] *
        sizeof(float));
            status = H5Dread(dset_id, H5T_NATIVE_FLOAT, memspace, spac_id,
        xfer_plist, flow_field[i]);

            // clean up
            H5Dclose(dset_id);
            H5Sclose(spac_id);
            H5Pclose(xfer_plist);
          }
          H5Fclose(file_id);
        }

        *Do you see any problem with this function? I am new to hdf5
        parallel.*

        Thanks in advance!

        --
        View this message in context:
        
http://hdf-forum.184993.n3.nabble.com/Slow-Reading-721GB-File-in-Parallel-tp4021429.html
        Sent from the hdf-forum mailing list archive at Nabble.com.

        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        [email protected] <mailto:[email protected]>
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to