[
https://issues.apache.org/jira/browse/IOTDB-398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096073#comment-17096073
]
EJTTianyu commented on IOTDB-398:
---------------------------------
Hi,
I am solving this issue. The reason I found IoTDBMergeTest to be stuck is that
in extreme cases concurrent query and merge will cause a deadlock. This
deadlock will occur under the following conditions:
# For an incoming query, the mergeLock's readLock of the StorageGroupProcessor
will be first obtained for each queryed measurement. For the TsFile that needs
to be used, the file's readLock is added and release the mergeLock's readLock
(note that the TsFile's readLock num> = 0)
# In the process of merge resources’ cleaning up, mergeLock‘s writeLock will
be added to the StorageGroupProcessor, and try to obtain the file's writeLock
(because the file's readLock of the previous step still exists, so the
writeLock cannot be obtained)
# When the query in the first step queries the next measurement, it will
request the mergeLock’s readLock of the StorageGroupProcessor (because the
mergeLock's writeLock has been added in the previous step, it forms a loop
wait).
For example, there exists a storage group named root.SG1 which contains
measurement s0 and s1. In the previous writing process, a seq file
xxx-1-0.tsfile and an unseq file xxx-2-0.tsfile have been generated, both
contains the s0,s1 data. The query process(the query: select * from root.SG1)
and the merge process are executed simultaneously. The merge process will merge
the seq and unseq file to a new file named xxx-1-1.tsfile, which is a seq file.
# the s0's query will add a readLock to the xxx-1-1.tsfile before all querys
end.
# the merge process would like to clean up the merge resources, then it holds
the mergeLock's writeLock and apply for the file's readLock.
# the s1's query will add a mergeLock to the StorageGroupProcessor. However,
the mergeLock is hold by merge process in the previos step. Then the condition
produces a deadLock.
There are two solutions to this deadlock problem:
# Change the lock mechanism of the query
# Break the deadlock loop waiting condition
Here, I uesd the second solution. The pr has submitted.
> IoTDBMergeTest Problems in CI
> -----------------------------
>
> Key: IOTDB-398
> URL: https://issues.apache.org/jira/browse/IOTDB-398
> Project: Apache IoTDB
> Issue Type: Bug
> Reporter: Xiangdong Huang
> Assignee: EJTTianyu
> Priority: Major
> Attachments: IoTDBLoadExternalTsfileTest_fillCache.log
>
>
> Hi,
> If you check Travis's status, you will find there are many tests may be
> blocked on master branch, which needs to be pay more attention.
> And I have reproduced one on my Mac (with OpenJDK11), I attach the jstack
> logs.
>
> Notice, the attachment is just a case. I wonder IoTDBMergeTest also has some
> problems..
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)