"Traynor, Kevin" <kevin.tray...@intel.com> writes:
>> -----Original Message-----
>> From: Aaron Conole [mailto:acon...@redhat.com]
>> Sent: Monday, December 21, 2015 7:27 PM
>> To: Traynor, Kevin
>> Cc: dev@openvswitch.org; Flavio Leitner
>> Subject: Re: [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent
>> 
>> "Traynor, Kevin" <kevin.tray...@intel.com> writes:
>> >> -----Original Message-----
>> >> From: Aaron Conole [mailto:acon...@redhat.com]
>> >> Sent: Friday, December 18, 2015 6:28 PM
>> >> To: dev@openvswitch.org
>> >> Cc: Flavio Leitner; Traynor, Kevin
>> >> Subject: [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent
>> >>
>> >> The user has control over the DPDK internal lcore coremask, but this
>> >> parameter can be autofilled with a bit more intelligence. If the user
>> >> does not fill this parameter in, we use the lowest set bit in the
>> >> current task CPU affinity.
>> >>
>> >> Signed-off-by: Aaron Conole <acon...@redhat.com>
>> >> Cc: Kevin Traynor <kevin.tray...@intel.com>
>> >> ---
>> >>  lib/netdev-dpdk.c | 47 ++++++++++++++++++++++++++++++++++++++++-------
>> >>  1 file changed, 40 insertions(+), 7 deletions(-)
>> >>
>> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> >> index 2a81058..696430f 100644
>> >> --- a/lib/netdev-dpdk.c
>> >> +++ b/lib/netdev-dpdk.c
>> >> @@ -65,6 +65,8 @@ static struct vlog_rate_limit rl =
>> VLOG_RATE_LIMIT_INIT(5,
>> >> 20);
>> >>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>> >>  #define OVS_VPORT_DPDK "ovs_dpdk"
>> >>
>> >> +#define MAX_BUFSIZ 256
>> >> +
>> >>  /*
>> >>   * need to reserve tons of extra space in the mbufs so we can align the
>> >>   * DMA addresses to 4KB.
>> >> @@ -2147,7 +2149,8 @@ grow_argv(char ***argv, size_t cur_siz, size_t
>> grow_by)
>> >>  }
>> >>
>> >>  static int
>> >> -get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
>> >> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv,
>> >> +              int argc)
>> >>  {
>> >>      struct dpdk_options_map {
>> >>          const char *ovs_configuration;
>> >> @@ -2155,13 +2158,13 @@ get_dpdk_args(const struct ovsrec_open_vswitch
>> >> *ovs_cfg, char ***argv)
>> >>          bool default_enabled;
>> >>          const char *default_value;
>> >>      } opts[] = {
>> >> -        {"dpdk_lcore_mask", "-c", true, "0x1"},
>> >> +        {"dpdk_lcore_mask", "-c", false, NULL},
>> >>          {"dpdk_mem_channels", "-n", true, "4"},
>> >>          {"dpdk_alloc_mem", "-m", false, NULL},
>> >>          {"dpdk_socket_mem", "--socket-mem", true, "1024,0"},
>> >>          {"dpdk_hugepage_dir", "--huge-dir", false, NULL},
>> >>      };
>> >> -    int i, ret = 1;
>> >> +    int i, ret = argc;
>> >>
>> >>      for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) {
>> >>          const char *lookup = smap_get(&ovs_cfg->other_config,
>> >> @@ -2203,7 +2206,8 @@ __dpdk_init(const struct ovsrec_open_vswitch
>> *ovs_cfg)
>> >>  {
>> >>      char **argv = NULL;
>> >>      int result;
>> >> -    int argc;
>> >> +    int argc = 0, argc_tmp;
>> >> +    bool auto_determine = true;
>> >>      int err;
>> >>      cpu_set_t cpuset;
>> >>
>> >> @@ -2236,12 +2240,41 @@ __dpdk_init(const struct ovsrec_open_vswitch
>> >> *ovs_cfg)
>> >>          ovs_abort(0, "Thread getaffinity error %d.", err);
>> >>      }
>> >>
>> >> -    argv = grow_argv(&argv, 0, 1);
>> >> +    argv = grow_argv(&argv, argc, argc+1);
>> >>      if (!argv) {
>> >>          ovs_abort(0, "Unable to allocate an initial argv.");
>> >>      }
>> >> -    argv[0] = strdup("ovs"); /* TODO use prctl to get process name */
>> >> -    argc = get_dpdk_args(ovs_cfg, &argv);
>> >> +    argv[argc++] = strdup("ovs"); /* TODO use prctl to get process name
>> */
>> >> +
>> >> +    argc_tmp = get_dpdk_args(ovs_cfg, &argv, argc);
>> >> +
>> >> +    while(argc_tmp != argc) {
>> >> +        if (!strcmp("-c", argv[argc++])) {
>> >> +            auto_determine = false;
>
> we can break; here.

Yep, I'll fix it.

> If we are not going to autodetermine, later we still set the affinities
> back to the non-isolcpu'd cores. This means that the -c is used for the
> rte_eal_init() only and the non-pmd threads will not use it. I haven't
> seen any issues with this in my testing but the question is - do we want
> to allow a user to set a specific core that the non-pmd threads will run
> on? If we don't allow and the non-isolcpu cores are heavily loaded it may
> mean a slowdown for non-pmd threads. Based on that possibility, I'm inclined
> to think that if the user gives us a -c option, we should not reset the
> affinities back to the non-isolcpu'd cores.
>
> What do you think?

I guess it's a semantic issue, and I agree with the sentiment that if
the user has explicitly set the affinities via a configuration, we should
honor that as the set affinity. I'll make that change; I think it brings
some real meaning to the mask option as well.

However, I'm thinking we should bring the config up one level - set the
thread default affinity for all of OVS, and have the DPDK just use that
for lcores. So basically, rename this option and move it up one level (I
don't think one currently exists) either in the DB or on the command
line and make the "-c" code just pull the affinity.

I think that simplifies a lot of this, and eliminates trying to add a
dpdk-only core mask with lots of caveats and "well, no.." for
explanations.

Does that make sense?

>> >> +        }
>> >> +    }
>> >> +
>> >> +    /**
>> >> +     * NOTE: This is an unsophisticated mechanism for determining the
>> DPDK
>> >> +     * lcore for the DPDK Master.
>> >> +     */
>> >> +    if (auto_determine)
>> >> +    {
>> >
>> > coding std if {
>> 
>> D'oh! Thanks.
>> 
>> >> +        int i;
>> >> +        for (i = 0; i < CPU_SETSIZE; i++) {
>> >> +            if (CPU_ISSET(i, &cpuset)) {
>> >
>> > I had been thinking we could put in a check here to ensure that the
>> > core we have selected is suitable for the socket-mem but I'm not sure
>> > if it's really needed. We could always add it later, it won't change
>> > the user interface.
>> 
>> I think such a change is an optimization, at best, and is really
>> difficult to get right. I'd rather do the naive but very predictable
>> thing, and give the option to override, than guess at a more complicated
>> solution on a first cut.
>> 
>> Plus, the socket-mem is only relevent for the PMD threads,
>> iirc. Even if I'm wrong though, if someone is playing with coremasks,
>> I think they should probably just be setting the affinity manually,
>> anyway.
>
> Sure - I agree.
>
>> 
>> >> +                char buf[MAX_BUFSIZ];
>> >> +                snprintf(buf, MAX_BUFSIZ, "0x%08llX", (1ULL<<i));
>> >> +                argv = grow_argv(&argv, argc, argc+2);
>> >> +                if (!argv) {
>> >> +                    ovs_abort(0, "Unable to grow argv for coremask");
>> >> +                }
>> >> +                argv[argc++] = strdup("-c");
>> >> +                argv[argc++] = strdup(buf);
>> >> +                i = CPU_SETSIZE;
>> >> +            }
>> >> +        }
>> >> +    }
>> >>
>> >>      argv = grow_argv(&argv, argc, argc+1);
>> >>      if (!argv) {
>> >> --
>> >> 2.6.1.133.gf5b6079
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to