Jianfeng Jia created ASTERIXDB-1264:
---------------------------------------
Summary: Feed didn't release lock if the ingesting hit some
exceptions
Key: ASTERIXDB-1264
URL: https://issues.apache.org/jira/browse/ASTERIXDB-1264
Project: Apache AsterixDB
Issue Type: Bug
Components: Feeds
Reporter: Jianfeng Jia
Assignee: Abdullah Alamoudi
This is a discussed issue in the mailing list. I copy it here to make it more
tractable and shareable.
I hit an wield issue that is reproducible, but only if the data has
duplications and also is large enough. Let me explained it step by step:
1. The dataset is very simple that only has two fields.
DDL AQL:
{code}
drop dataverse test if exists;
create dataverse test;
use dataverse test;
create type t_test as closed{
fa: int64,
fb : int64
}
create dataset ds_test(t_test) primary key fa;
create feed fd_test using socket_adapter
(
("sockets"="nc1:10001"),
("address-type"="nc"),
("type-name"="t_test"),
("format"="adm"),
("duration"="1200")
);
set wait-for-completion-feed "false";
connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard;
{code}
——————————————————————————————
That AdvancedFT_Discard policy will ignore the exception from the insertion and
keep ingesting.
2. Ingesting the data by a very simple socked adapter which reads the record
one by one from an adm file. The src is
here:https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java
The data and the app package is provided here:
https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing
To feed the data you can run:
./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm
-u for sever url
-p for server port
-c for count of line you want to ingest
3. After ingestion, all the requests about the ds_test was hanging. There is no
exception and no responds for hours. However it can respond any other queries
that on other datasets, like Metadata.
That data contains some duplicated records which should trigger the insert
exception. If I change the count from 5000000 to lower, let’s say 3000000, it
has no problems, although it contains duplications as well.
Answer from [~amoudi] :
I know exactly what is going on here. The problem is you pointed out is
caused by the duplicate keys. If I remember correctly, the main issue is
that locks that are placed on the primary keys are not released.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)