Attention is currently required from: laforge, pespin.

neels has posted comments on this change. ( 
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539?usp=email )

Change subject: per-HNB GTP-U traffic counters via nft
......................................................................


Patch Set 6:

(26 comments)

Patchset:

PS6:
> Hi @nhofmeyr@sysmocom. […]
Actually it is the same thread running twice.
There is one "instruction set" and the caller decides how to split the thread 
responsibilities.

It would not be a problem to run only one thread, or three, which is the point 
of this implementation.

For review of the blocking/nonblocking nature, you may find interesting to know 
that all requests are handled by the maintenance thread, except that the second 
thread handles ONLY and ALL the GET_COUNTERS requests. This is also described 
in a comment in-code IIRC.

For review of functionality it should not make any difference how often the 
thread runs.


File include/osmocom/hnbgw/hnbgw.h:

https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/c52e8836_5a81c567
PS6, Line 384:  /* When rules to count traffic to and from this hNodeB are 
present, this reflects the state in nftables
> I'm not understanding this comment. You first mention "rules", and after 
> "nftables rules". […]
"rules" and "nftables rules" is the same thing.

is this a better comment? :

    State that the main thread needs in order to know what to request from the 
worker threads.


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/0d7e16e5_ff928953
PS6, Line 402:          struct {
> maybe you want to declare the struct type once and use it twice here. […]
(in an earlier patch set i actually had the two combined in one struct.)

The two nft_kpi_handle and nft_kpi_val are also used in function signatures and 
static vars, but this combined one (ul, dl) exists only here. I can make an 
extra struct for that, but that makes more lines of code, not less.

should i still do that? i wouldn't do it, but my opinion is not strong


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/63da49ff_ea065842
PS6, Line 446:                  bool enable;
> I actually think they are self-explanatory.
nft_kpi.enable means nft_kpi is enabled,
the nft_kpi.table_name is used as name for the nft table... =)


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/d018d3ed_0a5347a2
PS6, Line 483:  struct {
> I actually think they are self-explanatory.
I'm not a friend of comments like

    /* timer to get nftables counters */
    nft_kpi.get_counters_timer;


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/c94caac6_48d70682
PS6, Line 514: void hnb_persistent_connected(struct hnb_persistent *hnbp);
> difficult to understand what are these exactly only by looking at the name. 
> […]
(not sure i understand what you mean by stack level)

i'll add comments to describe. they are a late addition, it seems i forgot.


File include/osmocom/hnbgw/nft_kpi.h:

https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/88db9e0a_5132b9cb
PS6, Line 7: struct nft_kpi_handle {
> I actually think they are simple, clarly named and self-explanatory.
i can explain in a comment that nftables has "handles" required to remove 
unnamed rules...?


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/b46445d9_0114dca7
PS6, Line 19: void nft_kpi_hnb_persistent_init(struct hnb_persistent *hnbp);
> These APIs seems to be counterparts, but sounds weird having init vs remove.
ah yes, i had names based on "init the ruleset" at some point, this is a 
leftover. i'll make it "add" instead.


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/2c1be44b_7aa4dd63
PS6, Line 21: int nft_kpi_hnb_start(struct hnb_persistent *hnbp, const struct 
osmo_sockaddr_str *gtpu_remote);
> This addr is probably coming from RANAP in binary form, so feels weird having 
> it converted to a osmo […]
nftables uses strings (also all logging), so it makes sense to use strings here.
(This is the same as in MGCP, also a string based protocol, where it is also 
often more useful to store the strings than the sockaddr)


File src/osmo-hnbgw/hnbgw.c:

https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/992ef815_ac9bbe62
PS6, Line 346:          hnb_persistent_disconnected(ctx->persistent);
> so API is not really protocol related. […]
(or "active" and "inactive", s.a.)


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/ae15427e_d9a3ee85
PS6, Line 590:  if (osmo_sockaddr_str_from_osa(&remote_str, &osa)) {
> there we go, a sockaddr transformed into a sockaddr_str, not sure if there's 
> a real reason to do so, […]
yes


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/a32a6f58_262c4afe
PS6, Line 607: {
> Sounds like a self-made FSM right? :P […]
exactly what i mean above. But this patch is not about fixing the HNBAP / HNB 
activity tracking. It only shows up because I add the two functions that should 
already have existed before this patch.

So my idea here is to add these "stubs" for a proper FSM to be done in a future 
patch.
That future FSM should then invoke these properly, instead of how osmo-hnbgw 
does it now.


File src/osmo-hnbgw/hnbgw_hnbap.c:

https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/4a88b01d_5dbf52f7
PS6, Line 571:  hnb_persistent_connected(ctx->persistent);
> Since this happens at registration time, maybe rename it to 
> "hnb_persistent_registered"? I still fin […]
yes, but how osmo-hnbgw handles HNBAP is weird. We start up a HNB when HNBAP 
HNB Register is received, but we practically never see a HNBAP HNB De-Register 
happen. We mostly just see the HNB's Iuh link drop without prior notice; often 
even after the same cell has registered again from a new connection.

My opinion here is not very strong. I can make it hnb_persistent_registered() 
and hnb_persistent_deregistered() -- but then again we have the ambiguity that 
the "deregistered()" is in practice just a disconnected link...

I think we need to actually further clarify the connected/disconnected and the 
HNBAP Register and Deregister behavior of osmo-hnbgw, in general, in separate 
patches. (e.g. what if HNBAP Deregisters but the Iuh link is never dropped... I 
think osmo-hnbgw currently does not catch that)

Maybe "hnb_active" and "hnb_inactive" are more general terms, where "active" 
means *both* HNBAP Registered *and* Iuh still connected? (and we can later make 
osmo-hnbgw actually do this properly in all cases)

what do you think


File src/osmo-hnbgw/hnbgw_vty.c:

https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/3fa54b1c_c4bb85aa
PS6, Line 884: DEFUN(cfg_hnbgw_nft_kpi, cfg_hnbgw_nft_kpi_cmd,
> I think we have VTY CMD flags to state that changes apply after a process 
> restart, you could use the […]
I wanted to, but it does not exist AFAICT:


```
/*! Attributes (flags) for \ref cmd_element */
enum {  
        CMD_ATTR_DEPRECATED     = (1 << 0),
        CMD_ATTR_HIDDEN         = (1 << 1),
        CMD_ATTR_IMMEDIATE      = (1 << 2),
        CMD_ATTR_NODE_EXIT      = (1 << 3),
        CMD_ATTR_LIB_COMMAND    = (1 << 4),
};
```


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/6cae760a_478f759a
PS6, Line 899:  g_hnbgw->config.nft_kpi.table_name = talloc_strdup(g_hnbgw, 
set_table_name);
> These 3 lines is basically a osmo_talloc_replace_string right?
yup, it incrementally became this so i didn't notice, thx


File src/osmo-hnbgw/nft_kpi.c:

https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/0490a540_ff8cff4a
PS6, Line 18: #include <inttypes.h>
> Some overall architecture comment description here would be great, to quickly 
> figure out the interes […]
hm i thought i had that somewhere...


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/3e9f7fdd_5972acb9
PS6, Line 86: static __thread struct nft_thread *g_nft_thread = NULL;
> Not exactly imho. […]
the nft_thread could be launched any number of times.
IMHO this is what the __thread is for, exactly.
This is a per-thread global state.

For any thread not running an nft command queue, g_nft_thread == NULL (here 
main() has g_nft_thread == NULL, and the two nft threads have this pointing at 
the static global state for that thread.)

I would like this to stay this way.


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/0e65f920_bd46f367
PS6, Line 120:          LOGP(DNFT, LOGL_ERROR, "error running nft cmd %s#%u: 
rc=%d cmd=%s\n",
> if nft fails, we have no way to say so to the callback?
yes we have the rc that is carried back to the main thread in the queued 
response.

we could argue whether all those failures are handled properly, in 
nft_thread_t2m_cb().
But the hard part is: what do you want to do when nftables didn't work.
All we can do really is output an error for the user to know and otherwise 
carry on as we did, to not completely disrupt service...? Or should we crash 
the program? Should we reject the hNodeB because our counters don't work? none 
of it makes much sense really... "nft just has to work."


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/795a7c3f_17857e37
PS6, Line 182: enum nft_thread_req_type {
> These can probably be moved up before the static functions.
can, but this is the part of the file that defines the thread request queue, so 
i'd like this to be here.


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/f8daf865_8ced844c
PS6, Line 193: static const char * const nft_thread_req_type_name[] = {
> osmo_value_string?
no. i can give reasons, but i hope "no" is sufficient?


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/d21ae530_c830874b
PS6, Line 207:          struct {
> IIUC these are actually the params for the different primitives right?
hence the naming yes


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/320add70_b368e60e
PS6, Line 251:  OSMO_ASSERT(g_nft_thread);
> why do you need the TLS g_nft_hread allocated statically? Why not allocate it 
> using talloc here? […]
Why allocate it using talloc?

If you tried to implement that you'd see that it makes no sense to request that.

If at all the main thread would talloc them, but why.


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/a565f984_6fbc3077
PS6, Line 297: static void nft_t2m_enqueue(struct nft_thread *t, struct 
nft_thread_req *req)
> you may want to use the concept of "worker thread" vs "main thread", because 
> well, main is also a th […]
The terms I use so far are "main()" and "(nft-)thread", because main() isn't 
started by pthread_create(). Do you want this changed to nft_w2m_ and _m2w_?

This is either just a fringe naming opinion bikeshed (as in, why bother me with 
a single letter in a static function's name?) -- or it is an important naming 
convention i am not aware of. which is it?

"ctx": i don't like meaningless words like "foo_context" or "foo_data" or 
"foo_storage", because every struct foo is all of these, implicitly. It is not 
the thread's "context", it is the entire thread and all of its state. It simply 
is the thread, period.
I really dislike the name "hnb_context" in osmo-hnbgw. It is just the hnb 
state, not some context around the hnb state.
For talloc it is really an actual ctx, and the name fits well there, but 
anywhere else we use the "ctx" name wrongly IMHO.


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/2ba58d8a_a76cb26d
PS6, Line 415: static int update_ctr(struct rate_ctr_group *cg, int cgidx, 
uint64_t *last_val, uint64_t new_val)
> I first though about cgi when reading this. […]
but the second thought made you realize that there is a cg counter group 
involved, right =)


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/22ecfa02_078ef35e
PS6, Line 419:          rate_ctr_add2(cg, cgidx, new_val - *last_val);
> Oh I'm surprised to find this function here. […]
update_ctr does what it does: updating counters. The important part is where is 
it called from:

update_ctr() is a code-dedup helper for hnb_update_counters(), that in turn is 
a make-my-loop-more-readable helper for main_thread_handle_get_counters_resp(). 
is that not clear enough? =)


https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/64391f7e_3ff09bf0
PS6, Line 452:          size_t want = g_nft_thread->counters_len + 64;
> I'd say usually one does *2 here. […]
I used \*2 in an array list for a diff algorithm implementation in the 
https://gameoftrees.org/ project, and learnt that *2 is bad because it runs 
away too fast. My code blew up memory usage, quite unreasonably. 64 is a 
reasonable amount of hNodeBs that we can tolerate a reallocation happening for.

since a new hnb showing up is such a rare event, it would even be ok to 
reallocate for each single new hnb. So now we reduce by factor 64, way way more 
than good enough.

(be aware that once we have enough room for the nr of hnb that are in use, 
there are no reallocations anymore at all.)



--
To view, visit https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://gerrit.osmocom.org/settings

Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: Ib2f0a9252715ea4b2fe9c367aa65f771357768ca
Gerrit-Change-Number: 36539
Gerrit-PatchSet: 6
Gerrit-Owner: neels <nhofm...@sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: laforge <lafo...@osmocom.org>
Gerrit-Reviewer: pespin <pes...@sysmocom.de>
Gerrit-Attention: laforge <lafo...@osmocom.org>
Gerrit-Attention: pespin <pes...@sysmocom.de>
Gerrit-Comment-Date: Wed, 22 May 2024 00:16:09 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: laforge <lafo...@osmocom.org>
Comment-In-Reply-To: pespin <pes...@sysmocom.de>
Gerrit-MessageType: comment

Reply via email to