[ 
https://issues.apache.org/jira/browse/KUDU-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783752#comment-16783752
 ] 

Tim Armstrong commented on KUDU-2721:
-------------------------------------

The options that I see for addressing this are:
* Generalise the parsing logic to handle all potential strings that can be 
emitted from /present
* Switch away from manually parsing and use something like get_nprocs_conf() 
instead.

It sounds like get_nprocs_conf() may have some issues too: 
https://lore.kernel.org/patchwork/patch/467415/. I guess we saw some issues 
with it in Impala like https://issues.apache.org/jira/browse/IMPALA-6500. So 
probably that is not a good idea.

I'm actually not convinced that parsing /present is the right thing either, it 
seems like we should be parsing /possible to account for CPUs that could be 
hotplugged. The format is documented here: 
https://www.kernel.org/doc/Documentation/cputopology.txt. All the examples in 
the file look like comma-separated lists of ranges, e.g. "2,4-31,32-63". 

It's documented as being compatible with bitmap_parselist(), which actually 
supports an additional postfix notation. I *believe* this feature is not used 
in the output of the CPU lists.

> CHECK can be hit when there are gaps in present CPU numbers
> -----------------------------------------------------------
>
>                 Key: KUDU-2721
>                 URL: https://issues.apache.org/jira/browse/KUDU-2721
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>         Environment: SLES12-SP3
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: crash
>         Attachments: cpus.cc
>
>
> We saw a case where Impala is crashing in the Kudu client and it seems to be 
> because the "present" string can have multiple ranges in it - "0-15,32-47\n" 
> in this case.  See 
> https://github.com/apache/kudu/blob/148a0c7bec6554724339a2235cbd723fb74be339/src/kudu/gutil/sysinfo.cc#L177
> I've attached a small test program based on the code illustrating that the 
> assert gets hit.
> I think we should figure out all of the possible formats for the present 
> string and make sure we handle them. Or figure out a different way to get 
> equivalent info.
> A workaround for the case that we saw was to disable hyperthreading, which 
> changed the string to being a single range.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to