I know exactly what is going on here. The problem is you pointed out is caused by the duplicate keys. If I remember correctly, the main issue is that locks that are placed on the primary keys are not released.
I will start fixing this issue tonight. Cheers, Abdullah. Amoudi, Abdullah. On Mon, Nov 30, 2015 at 4:52 PM, Jianfeng Jia <[email protected]> wrote: > Dear devs, > > I hit an wield issue that is reproducible, but only if the data has > duplications and also is large enough. Let me explained it step by step: > > 1. The dataset is very simple that only has two fields. > DDL AQL: > ————————————— > drop dataverse test if exists; > create dataverse test; > use dataverse test; > > create type t_test as closed{ > fa: int64, > fb : int64 > } > > create dataset ds_test(t_test) primary key fa; > > create feed fd_test using socket_adapter > ( > ("sockets"="nc1:10001"), > ("address-type"="nc"), > ("type-name"="t_test"), > ("format"="adm"), > ("duration"="1200") > ); > > set wait-for-completion-feed "false"; > connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard; > > —————————————————————————————— > > That AdvancedFT_Discard policy will ignore the exception from the > insertion and keep ingesting. > > 2. Ingesting the data by a very simple socked adapter which reads the > record one by one from an adm file. The src is here: > https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java > The data and the app package is provided here: > https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing > To feed the data you can run: > > ./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm > > -u for sever url > -p for server port > -c for count of line you want to ingest > > 3. After ingestion, all the requests about the ds_test was hanging. There > is no exception and no responds for hours. However it can respond any other > queries that on other datasets, like Metadata. > > That data contains some duplicated records which should trigger the insert > exception. If I change the count from 5000000 to lower, let’s say 3000000, > it has no problems, although it contains duplications as well. > > Any feed experts have any hint on which part could be wrong? cc and nc log > was attached. Thank you! > > > > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine > > >
