On 06.09.2018 14:04, Jiri Olsa wrote:
> On Wed, Sep 05, 2018 at 10:19:56AM +0300, Alexey Budankov wrote:
>>
>> The map->data buffers are used to preserve map->base profiling data 
>> for writing to disk. AIO map->cblocks are used to queue corresponding 
>> map->data buffers for asynchronous writing. map->cblocks objects are 
>> located in the last page of every map->data buffer.
>>
>> Signed-off-by: Alexey Budankov <[email protected]>
>> ---
>>  Changes in v7:
>>   - implemented handling record.aio setting from perfconfig file
>>  Changes in v6:
>>   - adjusted setting of priorities for cblocks;
>>  Changes in v5:
>>   - reshaped layout of data structures;
>>   - implemented --aio option;
>>  Changes in v4:
>>   - converted mmap()/munmap() to malloc()/free() for mmap->data buffer 
>> management 
>>  Changes in v2:
>>   - converted zalloc() to calloc() for allocation of mmap_aio array,
>>   - cleared typo and adjusted fallback branch code;
>> ---
>>  tools/perf/builtin-record.c | 15 ++++++++++++-
>>  tools/perf/perf.h           |  1 +
>>  tools/perf/util/evlist.c    |  7 +++---
>>  tools/perf/util/evlist.h    |  3 ++-
>>  tools/perf/util/mmap.c      | 53 
>> +++++++++++++++++++++++++++++++++++++++++++++
>>  tools/perf/util/mmap.h      |  6 ++++-
>>  6 files changed, 79 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index 22ebeb92ac51..f17a6f9cb1ba 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -326,7 +326,8 @@ static int record__mmap_evlist(struct record *rec,
>>  
>>      if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>>                               opts->auxtrace_mmap_pages,
>> -                             opts->auxtrace_snapshot_mode) < 0) {
>> +                             opts->auxtrace_snapshot_mode,
>> +                             opts->nr_cblocks) < 0) {
>>              if (errno == EPERM) {
>>                      pr_err("Permission error mapping pages.\n"
>>                             "Consider increasing "
>> @@ -1287,6 +1288,8 @@ static int perf_record_config(const char *var, const 
>> char *value, void *cb)
>>              var = "call-graph.record-mode";
>>              return perf_default_config(var, value, cb);
>>      }
>> +    if (!strcmp(var, "record.aio"))
>> +            rec->opts.nr_cblocks = strtol(value, NULL, 0);
>>  
>>      return 0;
>>  }
>> @@ -1519,6 +1522,7 @@ static struct record record = {
>>                      .default_per_cpu = true,
>>              },
>>              .proc_map_timeout     = 500,
>> +            .nr_cblocks           = 2
>>      },
>>      .tool = {
>>              .sample         = process_sample_event,
>> @@ -1678,6 +1682,8 @@ static struct option __record_options[] = {
>>                        "signal"),
>>      OPT_BOOLEAN(0, "dry-run", &dry_run,
>>                  "Parse options then exit"),
>> +    OPT_INTEGER(0, "aio", &record.opts.nr_cblocks,
>> +                "asynchronous trace write operations (min: 1, max: 32, 
>> default: 2)"),
> 
> ok, so this got silently added in recent versions and I couldn't
> find any justification for it.. why do we use more aio blocks for
> single map now? also why the default is 2?

Having more blocks may improve thruput from kernel to userspace for 
cases when we get more data at map->base but the started AIO is not 
finished yet. That can easily happen between calls of 
record__mmap_read_evlist().

> 
> the option should be more specific like 'aio-blocks'

ok.

> 
> the change is difficult enough.. we should start simple and add
> these additions with proper justification in separate patches

Setting default to 1 gives the simplest solution. I could provide 
justification where spinning at record__aio_sync() becomes the hotspot.

> 
> thanks,
> jirka
> 

Reply via email to