Hi Ilya,

The patch has been applied to the night build 3.6.519, the data has been
migrated with se_exp using complete export-import. The issue has not appear
again for a week already. I'll keep monitoring Sedna.

However, it seems there is another issue with that night build:

LOG   27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=-1)
[sp.c:sp_send_msg:64]: Message send : type 250, len 0
LOG   27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=-1)
[sp.c:sp_recv_msg:34]: Message recv : type 210, len 0
LOG   27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=49)
[sp.c:sp_send_msg:64]: Message send : type 230, len 0
LOG   27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=49)
[sp.c:sp_recv_msg:34]: Message recv : type 300, len 76
LOG   27/10/2014 01:21:18 (TRN nxp pid=20091 sid=49 trid=49)
[tr_functions.cpp:on_user_statement_begin:102]: User query:
---   DROP DOCUMENT "TreeExport.xml" IN COLLECTION "treeStructures/released"
ERROR 27/10/2014 01:21:20 (TRN nxp pid=20091 sid=49 trid=49)
[btree.cpp:bt_delete_tmpl:273]: (SE1008) Sedna index subsystem error.
Details: Cannot delete object which is not in the btree
LOG   27/10/2014 01:21:20 (TRN nxp pid=20091 sid=49 trid=-1)
[tr_common_funcs.cpp:on_transaction_end:416]: Transaction has been ROLLED
BACK
LOG   27/10/2014 01:21:20 (TRN nxp pid=20091 sid=49 trid=-1)
[sp.c:sp_send_msg:64]: Message send : type 100, len 122
LOG   27/10/2014 01:21:25 (TRN nxp pid=20091 sid=49 trid=-1)
[sp.c:sp_recv_msg:34]: Message recv : type 500, len 0
LOG   27/10/2014 01:21:25 (TRN nxp pid=20091 sid=49 trid=-1)
[sp.c:sp_send_msg:64]: Message send : type 510, len 0
INFO  27/10/2014 01:21:25 (SM nxp pid=19377)
[bm_core.cpp:bm_log_out_io_statistics:537]: IO block reads:0, writes:0
LOG   27/10/2014 01:21:25 (TRN nxp pid=20091 sid=49 trid=-1)
[tr.cpp:TRmain:456]: Session is closed

It's persistent, I could not delete the document at all until I deleted the
indexes on that collection:
  <index name="category_href" backend="btree" as_type="xs:string"
on_path="descendant::element(Category)"
by_path="attribute::attribute(href)" object_name="treeStructures/released"
object_type="collection"/>
  <index name="category_id" backend="btree" as_type="xs:string"
on_path="descendant::element(Category)"
by_path="attribute::attribute(Identifier)"
object_name="treeStructures/released" object_type="collection"/>

Any ideas?

Best regards,
Ivan Lagunov


On Fri, Oct 17, 2014 at 10:05 AM, Ilya Taranov <epsi...@socio.msu.ru> wrote:

> Hi, here is the patch over the latest label that should add protocol
> minimal debugging.
>
> On Wed, Oct 15, 2014 at 9:53 AM, Ivan Lagunov <lagi...@gmail.com> wrote:
>
>> Hi Ilya,
>>
>> As it's stated in the email topic, I use Sedna 3.5.615. It was built
>> from sources with RelWithDebugInfo flag enabled on the RHEL linux
>> server. If you want to add some additional debug info, I'd suggest adding
>> it on top of the latest night build though. I'm planning to upgrade anyway
>> as there were quite some nice features implemented.
>>
>> @Charles, okay, nice to know this. If only I could provide a consistently
>> failing test case. Currently the issue is quite mysterious.
>>
>> Best regards,
>> Ivan Lagunov
>>
>> On Wed, Oct 15, 2014 at 11:02 AM, Ivan Shcheklein <shchekl...@gmail.com>
>> wrote:
>>
>>> Ilya,
>>>
>>> As far as I remember. It is almost the latest dev sources built with
>>> relWithDebInfo. Internal data structures should be compatible.
>>>
>>> Ivan
>>>
>>> On Wed, Oct 15, 2014 at 2:08 AM, Ilya Taranov <epsi...@socio.msu.ru>
>>> wrote:
>>>
>>>> I'm not sure, latest build will fix the problem.
>>>>
>>>> It's definitely some client-to-app communication problem.
>>>> If I make a git patch for you with additional debug info, will you be
>>>> able to apply it (assuming, the data file version will be the same)? What
>>>> build do you currently use?
>>>>
>>>> On Sat, Oct 11, 2014 at 12:48 AM, Ivan Lagunov <lagi...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> I've analyzed logs for other issue occurrences and it has appeared
>>>>> more complicated.
>>>>>
>>>>> First, could you please clarify about "Session is ready" statement,
>>>>> when exactly is it dumped to the log file? I'm wondering because when the
>>>>> database is hanging, those log entries still appear at some random moments
>>>>> of time.
>>>>>
>>>>> Second, I'm going to provide the investigation details for several
>>>>> more issue occurrences here. Below "fast XQuery" means that it uses index
>>>>> and is expected to be very fast.
>>>>>
>>>>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0
>>>>>
>>>>> se_trn 27173 - DROP DOCUMENT operation, COMMITED
>>>>> se_gov 29197
>>>>> se_sm 29224
>>>>> se_trn 29617 - fast XQuery basictypes:getBasicType, not COMMITED, gdb
>>>>> thread 1 in select
>>>>> se_trn 29820 - Session is ready
>>>>> se_trn 29976 - Session is ready
>>>>>
>>>>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0
>>>>>
>>>>> se_gov 15890
>>>>> se_sm 15899
>>>>> se_trn 21611 - doc-available XQuery, not COMMITED, gbd thread 1 in
>>>>> nanosleep
>>>>> se_trn 21709 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>>>>> se_trn 21713 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>>>>> se_trn 21716 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>>>>> se_trn 21719 - doc('$version') XQuery, COMMITED, gdb thread 1 in select
>>>>> se_trn 21727 - LOAD STDIN operation, not COMMITED, gbd thread 1 in
>>>>> nanosleep
>>>>> se_trn 21996 - fast XQuery basictypes:getBasicType, not COMMITED, gdb
>>>>> thread 1 in select
>>>>> se_trn 22002 - document("$collections") XQuery, not COMMITED, gdb
>>>>> thread 1 in nanosleep
>>>>> se_trn 22010 - Session is ready
>>>>> se_trn 22013 - Session is ready
>>>>> se_trn 22152 - Session is ready
>>>>> se_trn 22295 - Session is ready
>>>>> se_trn 23615 - Session is ready
>>>>> se_trn 24315 - Session is ready
>>>>>
>>>>> Questions:
>>>>> 1. Does nanosleep state mean that the operation is waiting for another
>>>>> one (here it's probably 21996 XQuery in select)?
>>>>> 2. Why does doc-available wait for any other operation? I'd expect it
>>>>> to be non-blocking and to run immediately on the latest data snapshot.
>>>>> 3. Why does doc('$version') wait for any other operation? It's a
>>>>> database metadata being fetched, so I'd also expect it to run immediately.
>>>>>
>>>>> https://www.dropbox.com/s/3ibo4dvzapfdirw/sedna_debug_info3.zip?dl=0
>>>>>
>>>>> se_trn 641 - fast XQuery packages:getPackage, not COMMITED, gdb thread
>>>>> 1 in select
>>>>> se_trn 645 - Session is ready & COMMITED without operation
>>>>> se_trn 648 - Session is ready
>>>>> se_trn 823 - Session is ready
>>>>> se_trn 898 - Session is ready
>>>>> se_gov 8444
>>>>> se_sm 8453
>>>>>
>>>>> Questions:
>>>>> What is going on with 645? There is "Session in ready" statement
>>>>> immediately followed by "COMMITED" statement in the event.log. What
>>>>> operation was committed? Is it possible the order of statements in the
>>>>> event.log is inconsistent? Even so there is no more statements with
>>>>> pid=645, so no operation to be seen that could be committed. Very
>>>>> suspicious.
>>>>>
>>>>> https://www.dropbox.com/s/38idxmpzjv6rvxf/sedna_debug_info4.zip?dl=0
>>>>>
>>>>> se_gov 1532
>>>>> se_sm 1711
>>>>> se_trn 6837 - DROP DOCUMENT operation, not COMMITED, gdb thread 1 in
>>>>> nanosleep
>>>>> se_trn 11703 - LOAD STDIN operation, not COMMITED, gdb thread 1 in
>>>>> nanosleep
>>>>> se_trn 11944 - document("$collections") XQuery, not COMMITED, gdb
>>>>> thread 1 in nanosleep
>>>>> se_trn 11972 - document("$collections") XQuery, not COMMITED, gdb
>>>>> thread 1 in nanosleep
>>>>> se_trn 12139 - XQuery collection("groupingDefinitions/wip"), COMMITED,
>>>>> gdb thread 1 in select
>>>>> se_trn 12183 - fast XQuery basictypes:getBasicType, not COMMITED, gdb
>>>>> thread 1 in select
>>>>> se_trn 12186 - XQuery collection("groupingDefinitions/wip"), COMMITED,
>>>>> gdb thread 1 in select
>>>>> se_trn 12195 - XQuery collection("groupingDefinitions/wip"), COMMITED,
>>>>> gdb thread 1 in select
>>>>> se_trn 12222 - document("$collections") XQuery, not COMMITED, gdb
>>>>> thread 1 in nanosleep
>>>>> se_trn 12302 - fast XQuery documents:getFilesByBasicTypeId, not
>>>>> COMMITED, gdb thread 1 in nanosleep
>>>>> se_trn 12307 - document("$documents") XQuery, not COMMITED, gdb thread
>>>>> 1 in nanosleep
>>>>> se_trn 12310 - document("$collections") XQuery, not COMMITED, gdb
>>>>> thread 1 in nanosleep
>>>>> se_trn 12377 - Session is ready
>>>>> se_trn 12382 - Session is ready
>>>>> se_trn 12423 - Session is ready
>>>>> se_trn 12848 - Session is ready
>>>>>
>>>>> Comment:
>>>>> I understand many of those XQueries are not optimal, they are part of
>>>>> an application developed by other devs. Now I'll take care of improving
>>>>> those operations to replace document("$documents") and
>>>>> document("$collections") with doc-available and col-available accordingly
>>>>> and to use an index for collection("groupingDefinitions/wip").
>>>>>
>>>>> Overall conclusion:
>>>>> There is no single operation/XQuery causing the database hanging.
>>>>> There were at least three different XQueries observed. Taking into account
>>>>> the seemingly random occurrences, it looks like the protocol issue indeed.
>>>>> I don't see how to apply tcpdump in this case as it's Production
>>>>> environment. Any ideas?
>>>>>
>>>>> I'd try to rebuild the database from sources and, perhaps, to take the
>>>>> latest night 3.5 (or 3.6?) build for this. Do you think it makes sense to
>>>>> try?
>>>>>
>>>>> Best regards,
>>>>> Ivan Lagunov
>>>>>
>>>>> On Fri, Oct 10, 2014 at 1:23 AM, Ilya Taranov <epsi...@socio.msu.ru>
>>>>> wrote:
>>>>>
>>>>>> Hi, from the top of my head...
>>>>>>
>>>>>> 1. Queries are evaluated in a lazy fashion. Which means, that before
>>>>>> client asks to fetch the next row, nothing is evaluated.
>>>>>>
>>>>>> So select() is a socket read, which is waiting for client to make a
>>>>>> decision on whether to commit (or end RO transaction) or ask for next 
>>>>>> part
>>>>>> of the result. All locks are only released when end transaction happens.
>>>>>> That's why others are waiting for one single client.
>>>>>>
>>>>>> To my mind, it's a wrong approach, and that's not entirely how I
>>>>>> would do it right now, But that days we had ACID as a primary goal in 
>>>>>> minds.
>>>>>>
>>>>>> 2. Session is ready, means that there is no ongoing transaction, but
>>>>>> the connection is established.
>>>>>>
>>>>>> 3. No idea in this particular case, but there can be a situation when
>>>>>> session is started without client connection. Recovery/checkpoint, maybe
>>>>>> something else, I'm not sure.
>>>>>>
>>>>>> 4. That dfntly looks strange, but should not cause a lock. Once
>>>>>> transaction is committed, it should release all the locks. So even should
>>>>>> there be orphan sessions, they would not interfere the normal operation.
>>>>>>
>>>>>> 5. Yes that pretty much looks like a problem. Based on logs, it looks
>>>>>> like 17338 is waiting for client to fetch a result.
>>>>>>
>>>>>> That may be a protocol issue, by the way, when as a result of
>>>>>> malformed message both client and sedna are waiting for each other.
>>>>>>
>>>>>> I would guess, that's always happens when
>>>>>> "categorytree:getCategoryTree()" is called, could you please try that? If
>>>>>> it confirms, could you please make a wireshark/tcpdump dump of this
>>>>>> communication?
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 9, 2014 at 6:42 AM, Ivan Lagunov <lagi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> Please check my investigations below.
>>>>>>>
>>>>>>> I've collected all the debug information for another hanging today,
>>>>>>> it's available here:
>>>>>>> https://www.dropbox.com/s/g14aer9821fzx05/sedna_debug_info6.zip?dl=0
>>>>>>> Here are the statuses of all Sedna processes:
>>>>>>>
>>>>>>> se_trn 13436 - DROP DOCUMENT operation, COMMITED
>>>>>>> se_trn 16245 - LOAD STDIN operation, COMMITED, no ESTABLISHED
>>>>>>> connection on port 5050 found (only 5151)
>>>>>>> se_trn 17335 - doc('$version') XQuery, COMMITED, gdb thread 1 in
>>>>>>> select () from /lib64/libc.so.6
>>>>>>> se_trn 17338 - XQuery running about 10sec, not COMMITED, gdb thread
>>>>>>> 1 in select () from /lib64/libc.so.6
>>>>>>> se_trn 17410 - XQuery running <1sec, COMMITED
>>>>>>> se_trn 17462 - Session is ready
>>>>>>> se_trn 17554 - Session is ready
>>>>>>> se_trn 18892 - Session is ready
>>>>>>> se_trn 19124 - Session is ready
>>>>>>> se_gov 20441 - gdb threads 1 and 2 in select () from /lib64/libc.so.6
>>>>>>> se_sm 20449
>>>>>>>
>>>>>>> Comments:
>>>>>>> 1. Where gdb is not stated, all threads are in semop () from
>>>>>>> /lib64/libc.so.6 except the last thread that is in 
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2
>>>>>>> (). I believe semop is a good state, correct me if I'm wrong.
>>>>>>> 2. All se_trn with "Session is ready" have no other entries in the
>>>>>>> latest event.log. As I collect only the latest event.log, I cannot check
>>>>>>> what was the last operation.
>>>>>>> 3. All se_trn processes have two ESTABLISHED connections in netstat
>>>>>>> for ports 5050 and 5151 if not stated otherwise.
>>>>>>> 4. se_trn 17338, 17410, 17462 were created by the same application
>>>>>>> that is under suspicion. The database unfreezed after the restart of 
>>>>>>> that
>>>>>>> application.
>>>>>>>
>>>>>>> Issues/questions:
>>>>>>> 1. What exactly does it mean when a thread is in select()? It seems
>>>>>>> the database is waiting but for the client or for some internal 
>>>>>>> operation,
>>>>>>> e.g. fetching the data. Please clarify.
>>>>>>> 2. What exactly does "Session is ready" mean? When does it appear in
>>>>>>> the event.log? I believe those connections are okay but I'm curious to 
>>>>>>> know.
>>>>>>> 3. se_trn 16245 - how is it possible that there is no ESTABLISHED
>>>>>>> connection on port 5050 while the process is still alive? An issue?
>>>>>>> 4. se_trn 17335 - how is it possible that thread is in select()
>>>>>>> while the transaction has been COMMITED in event.log? An issue?
>>>>>>> 5. se_trn 17338 - looks like a root cause of Sedna hanging.
>>>>>>> Currently I run it via XML:DB API implementation mentioned earlier, more
>>>>>>> specifically I run the XQuery via XQueryService.query("xquery string")
>>>>>>> call. Besides, I always try to close the connection in the Java finally
>>>>>>> block, so it looks roughly like this:
>>>>>>>
>>>>>>> try {
>>>>>>>     rootCollection = DatabaseManager.getCollection(...);
>>>>>>>     ... do something ...
>>>>>>> } finally {
>>>>>>>     rootCollection.close();
>>>>>>> }
>>>>>>>
>>>>>>> According to event.log the XQuery was started at 09/10/2014 12:00:01.
>>>>>>> In the application log somewhere between 12:01:53 and 12:05:00
>>>>>>> appeared a message:
>>>>>>>
>>>>>>>> Warning: an open Sedna XML:DB Collection/Connection was just
>>>>>>>> automatically closed due to inactivity, you SHOULD close these 
>>>>>>>> manually.
>>>>>>>
>>>>>>> It seems to mean that the connection was considered as abandoned and
>>>>>>> removed by XML:DB API implementation (see above conn-log-abandoned=true
>>>>>>> and conn-remove-abandoned-timeout=120000).
>>>>>>> For me it looks like the client waited for 2 minutes but did not get
>>>>>>> any response from the database. So there was an attempt to close the
>>>>>>> connection. It's easy to understand as the database hanged at about
>>>>>>> 12:00:02 when the last transaction was COMMITED. So it may be even that 
>>>>>>> the
>>>>>>> root cause is hidden somewhere deeper. For example, this long XQuery 
>>>>>>> could
>>>>>>> cause a deadlock.
>>>>>>>
>>>>>>> I understand it may be an issue in XML:DB API implementation but its
>>>>>>> support was stopped by Charles Foster. Unfortunately, we cannot afford 
>>>>>>> to
>>>>>>> migrate to XQJ API implementation yet. The most tricky part to 
>>>>>>> understand
>>>>>>> is why we had not faced similar issues before recent occurrence when it 
>>>>>>> has
>>>>>>> become a regular major issue immediately. There were no server updates 
>>>>>>> and
>>>>>>> no changes that could affect the way transactions/connections are 
>>>>>>> handled.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Ivan Lagunov
>>>>>>>
>>>>>>> On Thu, Oct 9, 2014 at 1:33 PM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry, copied the wrong link for the connection pool documentation.
>>>>>>>> Here is the correct one:
>>>>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10
>>>>>>>>
>>>>>>>> Besides, I use the following values for the connection pool
>>>>>>>> parameters in the suspected application:
>>>>>>>> conn-max-active=50
>>>>>>>> conn-max-wait=30000
>>>>>>>> conn-log-abandoned=true
>>>>>>>> conn-remove-abandoned-timeout=120000
>>>>>>>> conn-abandoned-check-interval=5000
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Ivan Lagunov
>>>>>>>>
>>>>>>>> On Thu, Oct 9, 2014 at 1:28 PM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ilya,
>>>>>>>>>
>>>>>>>>> I use Charles Foster implementation of XML:DB API, version 1.2.6
>>>>>>>>> without any customizations:
>>>>>>>>> http://www.cfoster.net/sedna/xmldb/download/
>>>>>>>>>
>>>>>>>>> Here is the link again to the connection pool documentation (it's
>>>>>>>>> a part of the above implementation):
>>>>>>>>> http://www.cfoster.net/sedna/xmldb/download/
>>>>>>>>>
>>>>>>>>> Meanwhile, I've identified the application that makes Sedna wait.
>>>>>>>>> At least restarting the application has unfreezed the database twice
>>>>>>>>> already. I'll analyze the application log along with Sedna event.log
>>>>>>>>> shortly. Hopefully, I'll find the root cause.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Ivan Lagunov
>>>>>>>>>
>>>>>>>>> On Thu, Oct 9, 2014 at 3:50 AM, Ilya Taranov <epsi...@socio.msu.ru
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> I can easily see the scenario, where one of the clients doesn't
>>>>>>>>>> terminate the transaction properly. Unfortunately, current 
>>>>>>>>>> application
>>>>>>>>>> timeout doesn't apply for client hangs. The issue can be easily 
>>>>>>>>>> fixed,
>>>>>>>>>> however.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, even fix still would not solve your problem, you
>>>>>>>>>> would still have application timeout wait on locks. Probably the most
>>>>>>>>>> straightforward way to find the cause for you would be to log 
>>>>>>>>>> connection
>>>>>>>>>> checkins/checkouts at the client driver level.
>>>>>>>>>>
>>>>>>>>>> Could you please give me a link to the connection pool + driver
>>>>>>>>>> you currently use (including all your changes if possible) so I 
>>>>>>>>>> could help
>>>>>>>>>> you with that?
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 7, 2014 at 4:37 AM, Ivan Lagunov <lagi...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ilya,
>>>>>>>>>>>
>>>>>>>>>>> Your suggestion seems to be correct. After I've restarted three
>>>>>>>>>>> Java applications that had connections to Sedna, the database has 
>>>>>>>>>>> resumed
>>>>>>>>>>> processing operations. Here is the event.log where Sedna has hanged 
>>>>>>>>>>> between
>>>>>>>>>>> 12:35:03 and 13:06:05:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.dropbox.com/s/pzafuvzcvfchxwa/event-2014-10-07-13-15-04-018.zip?dl=0
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure which of the applications caused it though. It
>>>>>>>>>>> still worries me as the queries before hanging look quite simple 
>>>>>>>>>>> and I
>>>>>>>>>>> cannot understand which client operation could have hanged the whole
>>>>>>>>>>> database.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Ivan Lagunov
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 6, 2014 at 12:29 PM, Ivan Lagunov <lagi...@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ilya,
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I use a connection pool via Sedna XML:DB API
>>>>>>>>>>>> implementation. It's documented here:
>>>>>>>>>>>>
>>>>>>>>>>>> http://www.cfoster.net/pdf/sedna/xmldb/documentation.pdf#page=10
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, there is no support for active connection
>>>>>>>>>>>> timeout. This could be the cause if there was a query hanging for 
>>>>>>>>>>>> some
>>>>>>>>>>>> reason. Still if some query hangs forever (that should not as it's 
>>>>>>>>>>>> in my
>>>>>>>>>>>> case) it looks more like a database bug/deadlock situation.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll try as you suggest next time.
>>>>>>>>>>>>
>>>>>>>>>>>> P.S. Sorry for the second email as I've forgotten to add CC
>>>>>>>>>>>> first.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Ivan Lagunov
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 3, 2014 at 10:23 PM, Ilya Taranov <
>>>>>>>>>>>> epsi...@socio.msu.ru> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Ivan, do you have a connection pool? I see one of the se_trn
>>>>>>>>>>>>> processes waiting for socket read.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you don't have a connection pool, it just might be a client
>>>>>>>>>>>>> problem. I'll take a deeper look anyway, just next time it 
>>>>>>>>>>>>> happens, could
>>>>>>>>>>>>> you please try to shutdown the client thread, if it's even 
>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 2, 2014 at 5:30 AM, Ivan Lagunov <
>>>>>>>>>>>>> lagi...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've been facing major issues with Sedna hanging regularly
>>>>>>>>>>>>>> for the last two weeks (after several months of stable work). I 
>>>>>>>>>>>>>> tried to do
>>>>>>>>>>>>>> the complete export-import of data using se_exp but Sedna still 
>>>>>>>>>>>>>> hangs
>>>>>>>>>>>>>> regularly. Looking into event.log does not help as queries 
>>>>>>>>>>>>>> differ every
>>>>>>>>>>>>>> time. Sometimes it looks like Sedna hangs after some heavy query 
>>>>>>>>>>>>>> (with huge
>>>>>>>>>>>>>> response) while in other cases there are no heavy queries to 
>>>>>>>>>>>>>> suspect. I've
>>>>>>>>>>>>>> written a script to collect the following logs/dumps you will 
>>>>>>>>>>>>>> find in the
>>>>>>>>>>>>>> archives:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> event.log - the last Sedna event log till the moment DB hangs
>>>>>>>>>>>>>> and is stopped;
>>>>>>>>>>>>>> sedna_procs.txt - list of all Sedna processes (ps -ef);
>>>>>>>>>>>>>> netstat.txt - list of all ESTABLISHED connections;
>>>>>>>>>>>>>> ipcs.txt - results of ipcs -s;
>>>>>>>>>>>>>> <pid>.txt - results of gdb with bt for all threads of a Sedna
>>>>>>>>>>>>>> process (available for all processes at the moment of hanging). 
>>>>>>>>>>>>>> This is
>>>>>>>>>>>>>> collected automatically for threads 1-4 of each process.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm providing two sets of logs/dumps for two separate hanging
>>>>>>>>>>>>>> events:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 25th September - seemed to hang without any heavy queries:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://www.dropbox.com/s/2luq53cstc4t3tl/sedna_debug_info1.zip?dl=0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2nd October - hanged after a heavy query:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://www.dropbox.com/s/scmszq9yphyv6xf/sedna_debug_info2.zip?dl=0
>>>>>>>>>>>>>> The heavy query occurs at LOG   02/10/2014 11:10:20
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you can find out the cause of the hangings, it'd be the
>>>>>>>>>>>>>> best. Otherwise it'd be nice to hear some recommendations at 
>>>>>>>>>>>>>> least on how
>>>>>>>>>>>>>> to avoid these issues/minimize an impact. May be I need to 
>>>>>>>>>>>>>> upgrade to some
>>>>>>>>>>>>>> later version but I'm currently using 3.5.615 that is already a 
>>>>>>>>>>>>>> night build
>>>>>>>>>>>>>> that we had to upgrade to after some older issue (reported back 
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> March-April of 2013). I'd try some 3.5 or 3.6 version but 
>>>>>>>>>>>>>> without knowing
>>>>>>>>>>>>>> the reason there is no guarantee the potential bug is fixed in 
>>>>>>>>>>>>>> the later
>>>>>>>>>>>>>> night build.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also it'd be really nice if there was an official release for
>>>>>>>>>>>>>> 3.5 with all the latest bug fixes (I see the last night build is 
>>>>>>>>>>>>>> 3.5.627 of
>>>>>>>>>>>>>> 8th May 2013).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Ivan Lagunov
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>>>> Meet PCI DSS 3.0 Compliance Requirements with EventLog
>>>>>>>>>>>>>> Analyzer
>>>>>>>>>>>>>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI
>>>>>>>>>>>>>> DSS Reports
>>>>>>>>>>>>>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download
>>>>>>>>>>>>>> White paper
>>>>>>>>>>>>>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog
>>>>>>>>>>>>>> Analyzer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Sedna-discussion mailing list
>>>>>>>>>>>>>> Sedna-discussion@lists.sourceforge.net
>>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Sedna-discussion mailing list
Sedna-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to