Re: Table C API

Philipp Mon, 15 Jul 2024 03:49:23 -0700

[2024-07-11 12:00] Omar Polo <[email protected]>
> Sorry for the long delay, I just missed the previous mail in the
> backlog, I'm afraid.
>
> Please don't feel bad to send a ping message if you don't hear a reply
> in about a week, sometimes life just get in the way and the mails lost
> in the backlog.
>
> Philipp <[email protected]> wrote:
> > Hi
> > 
> > [2024-05-30 21:54] Philipp <[email protected]>
> > > I'm currently working on the C API for the tables implementations. The
> > > current API has the problem that the callbacks mixes request and
> > > response. So I have created a new C api with callbacks for requests
> > > and extra functios for response. The callbacks looks the following:
> > >
> > > [...]
> > >
> > > What do you think about the new API?
>
> Mostly looking good to me.  I have a few nits and ideas on how we can
> further simplify all of this so the table_stdio.c layer can remain very
> simple.
>
> > > I have also implement some fallback handler for the old api. They are
> > > primary for simpler testing and development. Patcher are tested with
> > > table-ldap and attached. The patches are currenty seperated by query
> > > type. For the push I would put them togetter in one commit.
>
> Actually the backward compat is not important.  This is an internal
> library used by a few tables, it's not published standalone and we don't
> make any compatibility promise of any sort (ABI or API.)
>
> It's fine to break the ABI/API if it makes sense, and I can help
> migrating the existing code to the new one.


The backwards compatibility helps me to develop and test my changes.
Also they are quite simple to implement. Also the old methods are marked
as depricated. I would remove them if/when they are not needed anymore.
But yes backward compatibility is not importend, but when it's easy to
implement I take it.

> > > My next step is to change the getline loop to a pool based loop. Then
> > > add some registration for other fd. This way the backend can be full
> > > asyncron.
>
> I think this is fine but table_stdio.c is not the right place to do so.
> event loop are delicate, the application might need to schedule its own
> events (at least postgres could query asynchronously the db IIRC) so I
> prefer not to force the event loop in table_stdio.c.  Not all tables
> need to be asynchronous.

The loop is to suport asynchronous implementation. For example with
postgres[0] you get the underlining fd. Then you can call a search and
wait (i.e. with poll) till the fd has some data. As far as I see
something like this is also possible with all other tables.

> What table_stdio.c should be in my opinion is just a tiny layer to parse
> a line into a struct or something equivalent and write the replies.
> Once it does this, then writing both blocking and async tables on top
> becomes way more easy.

Writing a blocking table is as easy as before. You register your callbacks
and in your callbacks you block till the table has finished.

For an async/nonblocking table you register also your fd and get called
when the fd is ready. After that you can check what data is returned in
your fd and return the new data.

> > I now also have implemented a simple poll based event loop. So a table
> > implementation can register fds with the following api:
> > 
> > void table_api_register_fd(int fd, short events, fd_callback cb);
> > void table_api_replace_fd(int old, int new);
> > void table_api_remove_fd(int fd);
>
> So this is what I'd avoid.  Let's try to have each layer doing one thing
> only (more or les), and leave the fd tracking and correlate events to
> other layers.

I have implementented this because I want to avoid duplicated code. When
you look at the current tables they all look quite similar[1]. So when
all table already does the same then the API could take this job.

> I'd refactor the changes so that the APIs exported are something like
> this, but mind you that this is just a suggestion, I haven't implemented
> it
>
>       struct table_request {
>               enum table_operation     op;
>               /* maybe even time_t     timestamp; */
>               char                    *table_name;
>               enum table_service       service;
>               char                    *id;
>               char                    *key;
>       };
>
>       /* parses the line into the provided storage for the struct */
>       int table_api_parse_request(struct table_request *, char *);

I more or less have this function already. In handle_request a protocol
line is parsed in a request struct. This function then calles the
registerd callbacks. I have also planed to add some api to query open
requests. Make the handle_request api pulbic (with some changes) is also
posible.

>       /* routines for the reply */
>       int table_api_update(struct table_request *, int);
>       int table_api_error(struct table_request *, const char *);
>       int table_api_found(struct table_request *, const char *);
>       int table_api_not_found(struct table_request *);
>
> This could be the base layer, it just deal with parsing into caller
> provided storage and fills the struct, then provides APIs to deal with
> the reply.  No dependencies on anything else other than the stdlib.
>
> (after this, we can also add stuff to deal with multiple lookup result, I
> really like your proposals wrt the protocol change, but we should try to
> do only one thing at once ;-)

The seperation between lookup_result and lookup_finish has two goals:
First of all to avoid duplicated code in the tables. With this new
function the indipendent tables don't need to concat the result on there
own. Secoundly, of course, to suport my proposed protocol change.

Seperation between the single request types is because they are used all
in different ways. Ihmo overloading the result functions to mutch makes
only the code more complicated. Simple responses function for each response
type don't have this problem.

> Making an async table over this is straightforward whichever event loop
> you like the most.
>
> Then I believe it's fine if we keep the register API implemented on top
> of these for the common blocking case.  The implementation can also be
> straightforward.
>
> I'd leave the choice of the event loop to use to the actual table
> implementations.  So that table-passwd can not care, and table-ldap can
> use a poll-based loop or libevent or whatever it wants.  Same for the
> databases.
>
> Running a fully asynchronous table on top of this with, say, libevent is
> just a matter of a few lines of glue code, which is acceptable IMHO.

Suggestion: I make the handle_request() public and add a result struct.
So a table can choose to use the loop or implement it's own.

I plan to use this loop for ldap and psql. As far as I see this loop
would also work for mysql. So it would be nice having it suported in
one version.

> (i'm mentioning libevent because I've seen you're including event.h,
> even if the event loop is actually based on poll.)

I only use libevent for the event buffer and evbuffer_readline(). When
you have a simpler dynamic string implementation I have no problem to
remove libevent.

> So, in the end, what I care the most is that we keep every layer as
> simple and dependency-free as we can.  I don't think the table_stdio
> layer should keep track of inflight request, nor have its own event
> loop, etc.  It's the application problem, not matter of transport layer.

As suggested I would split the loop and the parsing so a table can choose
to use it or use it's own.

Im unsure about the inflight requests. I had the idea that psql/mysql
could profit from them because they can't send multible requests. Also
it's currently needed for the lookup result/finished functionality. At
least for lookups I would keep it till the protocoll suports seperation
between lookup result and finish.

Philipp

[1] https://www.postgresql.org/docs/current/libpq-async.html
[1] probably because they are mostly written by Gilles

Re: Table C API

Reply via email to