Re: [Lustre-discuss] Bad read performance

Alvaro Aguilera Tue, 01 Sep 2009 18:01:22 -0700

hi,

here is the requested information:


before test:

llite.fastfs-ffff810102a6a400.read_ahead_stats=
snapshot_time:         1251851453.382275 (secs.usecs)
pending issued pages:           0
hits                      7301235
misses                    10546
readpage not consecutive  14369
miss inside window        1
failed grab_cache_page    6285314
failed lock match         0
read but discarded        98955
zero length file          0
zero size window          3495
read-ahead to EOF         172
hit max r-a issue         783042
wrong page from grab_cache_page 0


after:

llite.fastfs-ffff810102a6a400.read_ahead_stats=
snapshot_time:         1251851620.183964 (secs.usecs)
pending issued pages:           0
hits                      7506005
misses                    330064
readpage not consecutive  14432
miss inside window        319450
failed grab_cache_page    6322954
failed lock match         17294
read but discarded        98955
zero length file          0
zero size window          3495
read-ahead to EOF         192
hit max r-a issue         837908
wrong page from grab_cache_page 0


there seems to by a lot of misses, as well as a locking problem, doesn't it?
Btw. in the test, 4 processes read 512mb each from a 2gb big file.

Regards,
Alvaro.

On Fri, Aug 21, 2009 at 3:38 PM, di wang <[email protected]> wrote:

> hello,
> Alvaro Aguilera wrote:
>
>> they run on different physical nodes and access the ost via 4x infiniband.
>>
>>  I never heard such problems, if they on different nodes.  Client memory?
> Can you post  read-ahead  stats (before and after the test)  here by
>
> lctl get_param llite.*.read_ahead_stats
>
>
> But there are indeed a lot fixes about stride read since 1.6.5, which is
> included in the tar ball I posted below.
> And it probably can fix your problem.
>
> Thanks
> WangDi
>
>  On Fri, Aug 21, 2009 at 3:15 PM, di wang <[email protected] <mailto:
>> [email protected]>> wrote:
>>
>>    Alvaro Aguilera wrote:
>>
>>        thanks for the hint, but unfortunately I can't make any
>>        updates to the cluster...
>>
>>        Do you think both of the problems I experienced are bugs in
>>        Lustre and are resolved in current versions?
>>
>>    It should be lustre bugs. The 2 processes runs on different node
>>    or same node?
>>
>>    Thanks
>>    WangDi
>>
>>
>>        Thanks.
>>        Alvaro.
>>
>>
>>        On Fri, Aug 21, 2009 at 6:32 AM, di wang <[email protected]
>>        <mailto:[email protected]> <mailto:[email protected]
>>
>>        <mailto:[email protected]>>> wrote:
>>
>>           Hello,
>>
>>           You may see bug 17197 and try to apply this patch
>>           https://bugzilla.lustre.org/attachment.cgi?id=25062  to your
>>           lustre src. Or you can wait 1.8.2.
>>
>>           Thanks
>>           Wangdi
>>
>>           Alvaro Aguilera wrote:
>>
>>               Hello,
>>
>>               as a project for college I'm doing a behavioral comparison
>>               between Lustre and CXFS when dealing with simple
>>        strided files
>>               using POSIX semantics. On one of the tests, each
>>        participating
>>               process reads 16 chunks of data with a size of 32MB
>>        each, from
>>               a common, strided file using the following code:
>>
>>
>> ------------------------------------------------------------------------------------------
>>               int myfile = open("thefile", O_RDONLY);
>>
>>               MPI_Barrier(MPI_COMM_WORLD); // the barriers are only
>>        to help
>>               measuring time
>>
>>               off_t distance = (numtasks-1)*p.buffersize;
>>               off_t offset = rank*p.buffersize;
>>
>>               int j;
>>               lseek(myfile, offset, SEEK_SET);
>>               for (j = 0; j < p.buffercount; j++) {
>>                     read(myfile, buffers[j], p.buffersize); //
>>        buffers are
>>               aligned to the page size
>>                     lseek(myfile, distance, SEEK_CUR);
>>               }
>>
>>               MPI_Barrier(MPI_COMM_WORLD);
>>
>>               close(myfile);
>>
>> ------------------------------------------------------------------------------------------
>>
>>               I'm facing the following problem: when this code is run in
>>               parallel the read operations on certain processes start to
>>               need more and more time to complete. I attached a graphical
>>               trace of this, when using only 2 processes.
>>               As you see, the read operations on process 0 stay more
>>        or less
>>               constant, taking about 0.12 seconds to complete, while on
>>               process 1 they increase up to 39 seconds!
>>
>>               If I run the program with only one process, then the time
>>               stays at ~0.12 seconds per read operation. The problem
>>        doesn't
>>               appear if the O_DIRECT flag is used.
>>
>>               Can somebody explain to me why is this happening? Since I'm
>>               very new to Lustre, I may be making some silly
>>        mistakes, so be
>>               nice to me ;)
>>
>>               I'm using Lustre SLES 10 Patchlevel 1, Kernel
>>               2.6.16.54-0.2.5_lustre.1.6.5.1.
>>
>>
>>               Thanks!
>>
>>               Alvaro Aguilera.
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>>               _______________________________________________
>>               Lustre-discuss mailing list
>>               [email protected]
>>        <mailto:[email protected]>
>>               <mailto:[email protected]
>>        <mailto:[email protected]>>
>>
>>               http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>>
>>
>>
>

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Bad read performance

Reply via email to