Sasha,
I have some in-line comments, but the main "threading issue" discussion is
near the end. Please advise.
Sasha Khapyorsky wrote:
On 10:48 Mon 29 Oct , Timothy A. Meier wrote:
I apologize for the style and submission issues - still adjusting...
No need to apologize :)
I was troubled with breaking this into pieces. The patch is really about
providing an abstract OSM Server that supports local/remote connections.
I can break them up, but in my mind, they were tightly coupled.
I think it could be broken at least to multiconnection support and the
rest abstractions. No need to split it now only for "split", just try
to make it in smaller patches in the next version of this.
+/* TODO move along with other IO abstraction code */
+int cio_printf( CIO_t *cio, const char *format, ...);
+int cio_flush( CIO_t *cio);
+int cio_getline( char **lineptr, size_t *n, CIO_t *cio);
+int cio_open( CIO_t *cio);
+int cio_close( CIO_t *cio);
+int cio_poll(CIO_t *cio, int timeout);
Later I see that all cio_* and CIO_* stuff is used only in
osm_console.c, then I think this all should be moved to this file,
local function should be static, etc..
The intent of the CIO abstraction is to support connections to the OSM
server. Currently, the only thing "planned" to use this connection is
the interactive Console. That might not always be the case.
Now it is the case. And if there are no concrete plans to use this APIs
externally I prefer to keep it local.
Okay, well I quoted the "planned" because I/we (LLNL) have some ideas (not
really plans) we would like to try that will use this abstraction.
Keeping it local, until needed elsewhere is fine.
+typedef struct _osm_console_thread_t
+{
+ int used;
+ unsigned short int port;
+ int authorized;
+ int state;
+ char name[CIO_INFO_SIZE];
+ char in_buff[CIO_BUFSIZE];
+ char out_buff[CIO_BUFSIZE];
+ char client_type[CIO_NOTE_SIZE]; // maps to option->console
(off|local|socket)
+ char client_ip[CIO_NOTE_SIZE];
+ char client_hn[CIO_INFO_SIZE];
+ unsigned int thread_num; // a unique ever increasing number +
osm_opensm_t *p_osm; // the global opensm singleton (protect with
lock)
+ CIO_t io; // the io streams for the connection
+ LoopCmd loop_command;
+ cl_thread_t consoleThread; // a specific thread each console
connection
+ struct timeval connect_time;
+} osm_console_thread_t;
I think this introduces CIO_MAX_CONNECTS new threads + for loop commands.
What about to do all in one thread - to use select() or poll() with
timeout on multiple file descriptors? This will "reserve" another CPUs
for running another OpenSM things. Another potential problem is multi
thread synchronizations - we had (and still have) a lot of issues in this
area.
I wasn't aware of thread synchronization issues....
You are correct, this potentially introduces 2*CIO_MAX_CONNECTS new threads.
(Worst case, all connections are used, all running a loop command.)
Currently, the only loop command is for printing status, but the software
was designed to support any command you may want to put in a
loop. If no additional commands will be "looped", then I agree its overkill
to put this in its own thread.
I think each connection/session should be in its own thread.
Wouldn't poll() on multiple file descriptors (connected and listened
sockets) be simpler and more robust approach here? Why?
See the thread/poll discussion below..
Currently those wrapper functions only provide a single implementation, but
I intend to extend them with additional functionality when I add SSL/TSL.
This is why I thought it would be clearer to see in a patch series..
Understood. Abstractions are kind of.... abstract. Its hard to see the
justification for an abstraction layer without having at least two different
implementations. I provide one. The second one will be SSL/TSL. I'd like
to provide that after the new framework/abstractions are in place and
working
just as before.
The new protocol will depend on new libraries/headers. We (LLNL)
discussed this, and thought conditionally compiling this feature in would
satisfy those folks who did not want to add this dependency if they did
not want the feature.
That should be fine.
Thanks for reviewing all of this. How would you like me to move forward?
Would you rather me (re)submit this Patch as a series of 2?
I think we need to close threading issue first. Then patch series of 2
looks fine for me.
I really think the "thread-per-session" would be a more flexible and
powerful
design. Setting up and maintaining threads might seem more complex at first,
but it makes servicing requests/commands much more simple because
everything is
in its own context.
The previous Console used a polling mechanism, and I found an edge case
condition
which allowed one connection to block the other. Thread-per-session (or
thread
per connection) makes it difficult for one session to influence another.
The number of threads/connections would be limited. Other than the normal
multi-threading issues, are there other thread hazards in OFED/OpenSM that I
need to be aware of?
Your thoughts?
I want to
establish this as a working baseline (no new functionality, just more
extensible) before adding the SSL/TSL code.
Understood. Thanks for doing this!
Sasha
--
Timothy A. Meier
Computer Scientist
ICCD/High Performance Computing
925.422.3341
[EMAIL PROTECTED]
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general