Hi Dennis, Thank you for taking the time.
On 2021/02/18 16:48, Dennis Buteyn wrote: > On 2/18/21 12:53 PM, Jaco Kroon wrote: >> Hi All, >> >> So I've been updating some code to use func_odbc vs previous solutions >> and have bumped into a few concerns (1 I'd appreciate feedback on the >> review, 2 is prelude to 3, which is the main issue currently): > <snip> >> 2. Too many queries resulting in the database server (which is set for >> maximum persistence) being unable to keep up, so enters >> ODBC(transaction,mydb)=meh + ODBC(forcecommit)=1. This sorts out the >> COMMIT rate since now things are getting grouped into transactions. > > <snip> > > Nothing is stopping you from using more than one database server. With > ODBC its pretty easy to spread the load over several machines. But > aside from throwing more metal at the problem, you may also want to > review how you are using your database. Perhaps some queries are extra > slow, maybe some queries can be combined, or perhaps a different > storage engine is more suitable, etc. Transactions help prevent > concurrent queries from stepping on each other's toes as well as > allowing the database to delay disk flushes. Transactions however may > also cause delays while waiting on row and table locks, potentially > deadlocking (especially long-running ones). The specific ODBC connection here is merely used for inserting "traces" into the database, so exactly this: [trace] dsn=ulsdb synopsis=Add a trace onto an existing call. syntax=[<level>]=<message> writesql=INSERT INTO calltrace(uniqueid, message, level) VALUES('${SQL_ESC(${UNIQUEID})}', '${SQL_ESC(${VALUE})}', '${IF($[${ARGC}>0]?${SQL_ESC(${ARG1})}:user)}') (Relies on patch mentioned in my point 1 for ARG1 to work correctly when not supplied) The table itself: CREATE TABLE `calltrace` ( `uniqueid` varchar(64) NOT NULL, `level` enum('user','admin','system') NOT NULL DEFAULT 'user', `tracetime` datetime(3) NOT NULL DEFAULT current_timestamp(3), `message` varchar(1024) NOT NULL, KEY `uniqueid` (`uniqueid`), KEY `tracetime` (`tracetime`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 So we'll do stuff like Set(ODBC_trace(user)=Received call from ${CALLERID(all)} to ${EXTEN}) in the dialplan for critical decision points. The other option is to log this to a file and to post-load it into the database. Really doesn't matter that much, and if it goes lost, it's not the end of the world, but the above just makes it simpler and more real time, allows us to use concurrent inserts. For any given uniqueid it's seldom that there are more than 30 entries generated, around 20 or so on average, we currently dealt with this by reducing the amount of logging, so whilst not all of it is strictly required, it's useful stuff to be able to enable support personnel to not have to bug the system administrators, and even to users to not need to log stuff to support to begin with. The other func odbc queries are almost all select queries, and are very well cached, in critical cases into astdb too. Those never cause issues, and even when the INSERTs are problematic continue to operate correctly. The above basically just starts slowing down because of the implicit COMMIT, with the above as a transaction it works very well overall, the transaction connection just gets held up for the duration of the call, but it does lower the commit rate from 50-100 (and higher in extreme cases) to 5-10 average. We can start seeing the effects of the forced disk IO around 75 odd COMMIT/s, and annoying slowdown beyond that. One do need to be aware that READ queries are issued on handles different than the WRITE queries, since the the _read function doesn't take res_odbc_transaction into consideration (which arguably it should but that has zero effect for my use case, in fact, as you point out it may actually be bad for my use case). > > Without diving deep into transactions, it sounds like your dialplan is > doing lots of small modifications during the call. Databases like > doing things in bulk, so perhaps collecting changes in channel > variables before submitting them with a hangup handler will be more > effective? That's possible, or just group them into a transaction, it's not going to reduce the overall number of queries much, but yes, since this is MariaDB one could use the "bulk insert syntax" if we really wanted to (but that's not standard SQL and may well fail if for some reason someone some years from now decide we need to swap out MariaDB for postgres), but that also complicates the overall process significantly. Our dialplan processing is usually well under a second in total, with average ring time on the Dial() around 12 seconds *usually* in ANSWER cases, and significantly shorter in NO ANSWER case. This means that at a call setup rate of 10 calls/second we need a concurrency of 150 to have a slight safety factor, but if the COMMIT only happens when the channel gets destroyed (and average call duration of 73 seconds, or overall 85 seconds from channel creation to channel destruction, that goes up to 850 with little to no safety, 900-1000 with, and that's in both cases cutting it finer than we like). Don't think the discussion here is how to make the database better/faster, but rather focus on how asterisk can make more convenient mechanisms available in order to enable more effective and efficient use of the database engine, and by implication the hardware. You raise a valid point though: what if the commit could take up significant (>100ms) amount of time? Should we then have an option in/for res_odbc_transaction to fork a separate thread for COMMIT before returning the ODBC handle to the pool? That actually makes some sense, so of the various 3.X options that involve res_odbc_transaction if there is then some way to "parallel commit", which would be fine for my use case, but may be terrible for others, eg, a sequence like: START TRANSACTION; INSERT INTO foo(a,b,c,d) VALUES(1,2,3,4); COMMIT; -- implicitly releases handle back to pool. SELECT * FROM foo WHERE a=1; If the COMMIT is parallel this opens a race towards the SELECT (which may end up using a different handle, and with default and sane isolation levels would result in a race condition even if both "transactions" are handled from the same dialplan "thread"). See above about func_odbc using different handles for transactional queries and read queries anyway. Kind Regards, Jaco -- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev