Hi, from the top of my head...

1. Queries are evaluated in a lazy fashion. Which means, that before client
asks to fetch the next row, nothing is evaluated.

So select() is a socket read, which is waiting for client to make a
decision on whether to commit (or end RO transaction) or ask for next part
of the result. All locks are only released when end transaction happens.
That's why others are waiting for one single client.

To my mind, it's a wrong approach, and that's not entirely how I would do
it right now, But that days we had ACID as a primary goal in minds.

2. Session is ready, means that there is no ongoing transaction, but the
connection is established.

3. No idea in this particular case, but there can be a situation when
session is started without client connection. Recovery/checkpoint, maybe
something else, I'm not sure.

4. That dfntly looks strange, but should not cause a lock. Once transaction
is committed, it should release all the locks. So even should there be
orphan sessions, they would not interfere the normal operation.

5. Yes that pretty much looks like a problem. Based on logs, it looks like
17338 is waiting for client to fetch a result.

That may be a protocol issue, by the way, when as a result of malformed
message both client and sedna are waiting for each other.

I would guess, that's always happens when "categorytree:getCategoryTree()"
is called, could you please try that? If it confirms, could you please make
a wireshark/tcpdump dump of this communication?


On Thu, Oct 9, 2014 at 6:42 AM, Ivan Lagunov <lagi...@gmail.com> wrote:

> Hi Ilya,
>
> Please check my investigations below.
>
> I've collected all the debug information for another hanging today, it's
> available here:
> https://www.dropbox.com/s/g14aer9821fzx05/sedna_debug_info6.zip?dl=0
> Here are the statuses of all Sedna processes:
>
> se_trn 13436 - DROP DOCUMENT operation, COMMITED
> se_trn 16245 - LOAD STDIN operation, COMMITED, no ESTABLISHED connection
> on port 5050 found (only 5151)
> se_trn 17335 - doc('$version') XQuery, COMMITED, gdb thread 1 in select ()
> from /lib64/libc.so.6
> se_trn 17338 - XQuery running about 10sec, not COMMITED, gdb thread 1 in
> select () from /lib64/libc.so.6
> se_trn 17410 - XQuery running <1sec, COMMITED
> se_trn 17462 - Session is ready
> se_trn 17554 - Session is ready
> se_trn 18892 - Session is ready
> se_trn 19124 - Session is ready
> se_gov 20441 - gdb threads 1 and 2 in select () from /lib64/libc.so.6
> se_sm 20449
>
> Comments:
> 1. Where gdb is not stated, all threads are in semop () from
> /lib64/libc.so.6 except the last thread that is in 
> pthread_cond_timedwait@@GLIBC_2.3.2
> (). I believe semop is a good state, correct me if I'm wrong.
> 2. All se_trn with "Session is ready" have no other entries in the latest
> event.log. As I collect only the latest event.log, I cannot check what was
> the last operation.
> 3. All se_trn processes have two ESTABLISHED connections in netstat for
> ports 5050 and 5151 if not stated otherwise.
> 4. se_trn 17338, 17410, 17462 were created by the same application that is
> under suspicion. The database unfreezed after the restart of that
> application.
>
> Issues/questions:
> 1. What exactly does it mean when a thread is in select()? It seems the
> database is waiting but for the client or for some internal operation, e.g.
> fetching the data. Please clarify.
> 2. What exactly does "Session is ready" mean? When does it appear in the
> event.log? I believe those connections are okay but I'm curious to know.
> 3. se_trn 16245 - how is it possible that there is no ESTABLISHED
> connection on port 5050 while the process is still alive? An issue?
> 4. se_trn 17335 - how is it possible that thread is in select() while the
> transaction has been COMMITED in event.log? An issue?
> 5. se_trn 17338 - looks like a root cause of Sedna hanging. Currently I
> run it via XML:DB API implementation mentioned earlier, more specifically I
> run the XQuery via XQueryService.query("xquery string") call. Besides, I
> always try to close the connection in the Java finally block, so it looks
> roughly like this:
>
> try {
>     rootCollection = DatabaseManager.getCollection(...);
>     ... do something ...
> } finally {
>     rootCollection.close();
> }
>
> According to event.log the XQuery was started at 09/10/2014 12:00:01.
> In the application log somewhere between 12:01:53 and 12:05:00 appeared a
> message:
>
>> Warning: an open Sedna XML:DB Collection/Connection was just
>> automatically closed due to inactivity, you SHOULD close these manually.
>
> It seems to mean that the connection was considered as abandoned and
> removed by XML:DB API implementation (see above conn-log-abandoned=true
> and conn-remove-abandoned-timeout=120000).
> For me it looks like the client waited for 2 minutes but did not get any
> response from the database. So there was an attempt to close the
> connection. It's easy to understand as the database hanged at about
> 12:00:02 when the last transaction was COMMITED. So it may be even that the
> root cause is hidden somewhere deeper. For example, this long XQuery could
> cause a deadlock.
>
> I understand it may be an issue in XML:DB API implementation but its
> support was stopped by Charles Foster. Unfortunately, we cannot afford to
> migrate to XQJ API implementation yet. The most tricky part to understand
> is why we had not faced similar issues before recent occurrence when it has
> become a regular major issue immediately. There were no server updates and
> no changes that could affect the way transactions/connections are handled.
>
> Best regards,
> Ivan Lagunov
>
> On Thu, Oct 9, 2014 at 1:33 PM, Ivan Lagunov <lagi...@gmail.com> wrote:
>
>> Sorry, copied the wrong link for the connection pool documentation. Here
>> is the correct one:
>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10
>>
>> Besides, I use the following values for the connection pool parameters in
>> the suspected application:
>> conn-max-active=50
>> conn-max-wait=30000
>> conn-log-abandoned=true
>> conn-remove-abandoned-timeout=120000
>> conn-abandoned-check-interval=5000
>>
>> Best regards,
>> Ivan Lagunov
>>
>> On Thu, Oct 9, 2014 at 1:28 PM, Ivan Lagunov <lagi...@gmail.com> wrote:
>>
>>> Hi Ilya,
>>>
>>> I use Charles Foster implementation of XML:DB API, version 1.2.6 without
>>> any customizations: http://www.cfoster.net/sedna/xmldb/download/
>>>
>>> Here is the link again to the connection pool documentation (it's a part
>>> of the above implementation):
>>> http://www.cfoster.net/sedna/xmldb/download/
>>>
>>> Meanwhile, I've identified the application that makes Sedna wait. At
>>> least restarting the application has unfreezed the database twice already.
>>> I'll analyze the application log along with Sedna event.log shortly.
>>> Hopefully, I'll find the root cause.
>>>
>>> Best regards,
>>> Ivan Lagunov
>>>
>>> On Thu, Oct 9, 2014 at 3:50 AM, Ilya Taranov <epsi...@socio.msu.ru>
>>> wrote:
>>>
>>>> I can easily see the scenario, where one of the clients doesn't
>>>> terminate the transaction properly. Unfortunately, current application
>>>> timeout doesn't apply for client hangs. The issue can be easily fixed,
>>>> however.
>>>>
>>>> Unfortunately, even fix still would not solve your problem, you would
>>>> still have application timeout wait on locks. Probably the most
>>>> straightforward way to find the cause for you would be to log connection
>>>> checkins/checkouts at the client driver level.
>>>>
>>>> Could you please give me a link to the connection pool + driver you
>>>> currently use (including all your changes if possible) so I could help you
>>>> with that?
>>>>
>>>> On Tue, Oct 7, 2014 at 4:37 AM, Ivan Lagunov <lagi...@gmail.com> wrote:
>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> Your suggestion seems to be correct. After I've restarted three Java
>>>>> applications that had connections to Sedna, the database has resumed
>>>>> processing operations. Here is the event.log where Sedna has hanged 
>>>>> between
>>>>> 12:35:03 and 13:06:05:
>>>>>
>>>>>
>>>>> https://www.dropbox.com/s/pzafuvzcvfchxwa/event-2014-10-07-13-15-04-018.zip?dl=0
>>>>>
>>>>> I'm not sure which of the applications caused it though. It still
>>>>> worries me as the queries before hanging look quite simple and I cannot
>>>>> understand which client operation could have hanged the whole database.
>>>>>
>>>>> Best regards,
>>>>> Ivan Lagunov
>>>>>
>>>>> On Mon, Oct 6, 2014 at 12:29 PM, Ivan Lagunov <lagi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ilya,
>>>>>>
>>>>>> Yes, I use a connection pool via Sedna XML:DB API implementation.
>>>>>> It's documented here:
>>>>>>
>>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10
>>>>>>
>>>>>> Unfortunately, there is no support for active connection timeout.
>>>>>> This could be the cause if there was a query hanging for some reason. 
>>>>>> Still
>>>>>> if some query hangs forever (that should not as it's in my case) it looks
>>>>>> more like a database bug/deadlock situation.
>>>>>>
>>>>>> I'll try as you suggest next time.
>>>>>>
>>>>>> P.S. Sorry for the second email as I've forgotten to add CC first.
>>>>>>
>>>>>> Best regards,
>>>>>> Ivan Lagunov
>>>>>>
>>>>>> On Fri, Oct 3, 2014 at 10:23 PM, Ilya Taranov <epsi...@socio.msu.ru>
>>>>>> wrote:
>>>>>>
>>>>>>> Ivan, do you have a connection pool? I see one of the se_trn
>>>>>>> processes waiting for socket read.
>>>>>>>
>>>>>>> If you don't have a connection pool, it just might be a client
>>>>>>> problem. I'll take a deeper look anyway, just next time it happens, 
>>>>>>> could
>>>>>>> you please try to shutdown the client thread, if it's even possible.
>>>>>>>
>>>>>>> On Thu, Oct 2, 2014 at 5:30 AM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've been facing major issues with Sedna hanging regularly for the
>>>>>>>> last two weeks (after several months of stable work). I tried to do the
>>>>>>>> complete export-import of data using se_exp but Sedna still hangs
>>>>>>>> regularly. Looking into event.log does not help as queries differ every
>>>>>>>> time. Sometimes it looks like Sedna hangs after some heavy query (with 
>>>>>>>> huge
>>>>>>>> response) while in other cases there are no heavy queries to suspect. 
>>>>>>>> I've
>>>>>>>> written a script to collect the following logs/dumps you will find in 
>>>>>>>> the
>>>>>>>> archives:
>>>>>>>>
>>>>>>>> event.log - the last Sedna event log till the moment DB hangs and
>>>>>>>> is stopped;
>>>>>>>> sedna_procs.txt - list of all Sedna processes (ps -ef);
>>>>>>>> netstat.txt - list of all ESTABLISHED connections;
>>>>>>>> ipcs.txt - results of ipcs -s;
>>>>>>>> <pid>.txt - results of gdb with bt for all threads of a Sedna
>>>>>>>> process (available for all processes at the moment of hanging). This is
>>>>>>>> collected automatically for threads 1-4 of each process.
>>>>>>>>
>>>>>>>> I'm providing two sets of logs/dumps for two separate hanging
>>>>>>>> events:
>>>>>>>>
>>>>>>>> 25th September - seemed to hang without any heavy queries:
>>>>>>>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0
>>>>>>>>
>>>>>>>> 2nd October - hanged after a heavy query:
>>>>>>>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0
>>>>>>>> The heavy query occurs at LOG   02/10/2014 11:10:20
>>>>>>>>
>>>>>>>> If you can find out the cause of the hangings, it'd be the best.
>>>>>>>> Otherwise it'd be nice to hear some recommendations at least on how to
>>>>>>>> avoid these issues/minimize an impact. May be I need to upgrade to some
>>>>>>>> later version but I'm currently using 3.5.615 that is already a night 
>>>>>>>> build
>>>>>>>> that we had to upgrade to after some older issue (reported back in
>>>>>>>> March-April of 2013). I'd try some 3.5 or 3.6 version but without 
>>>>>>>> knowing
>>>>>>>> the reason there is no guarantee the potential bug is fixed in the 
>>>>>>>> later
>>>>>>>> night build.
>>>>>>>>
>>>>>>>> Also it'd be really nice if there was an official release for 3.5
>>>>>>>> with all the latest bug fixes (I see the last night build is 3.5.627 
>>>>>>>> of 8th
>>>>>>>> May 2013).
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Ivan Lagunov
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>>>>>>>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS
>>>>>>>> Reports
>>>>>>>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>>>>>>>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>>>>>>>>
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>>>>>>>> _______________________________________________
>>>>>>>> Sedna-discussion mailing list
>>>>>>>> Sedna-discussion@lists.sourceforge.net
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Sedna-discussion mailing list
Sedna-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to