Cool. The prototype Jianfeng is building revealed quite a few issues in the system :-)
On Mon, Nov 30, 2015 at 5:24 PM, abdullah alamoudi <[email protected]> wrote: > I know exactly what is going on here. The problem is you pointed out is > caused by the duplicate keys. If I remember correctly, the main issue is > that locks that are placed on the primary keys are not released. > > I will start fixing this issue tonight. > Cheers, > Abdullah. > > Amoudi, Abdullah. > > On Mon, Nov 30, 2015 at 4:52 PM, Jianfeng Jia <[email protected]> > wrote: > > > Dear devs, > > > > I hit an wield issue that is reproducible, but only if the data has > > duplications and also is large enough. Let me explained it step by step: > > > > 1. The dataset is very simple that only has two fields. > > DDL AQL: > > ————————————— > > drop dataverse test if exists; > > create dataverse test; > > use dataverse test; > > > > create type t_test as closed{ > > fa: int64, > > fb : int64 > > } > > > > create dataset ds_test(t_test) primary key fa; > > > > create feed fd_test using socket_adapter > > ( > > ("sockets"="nc1:10001"), > > ("address-type"="nc"), > > ("type-name"="t_test"), > > ("format"="adm"), > > ("duration"="1200") > > ); > > > > set wait-for-completion-feed "false"; > > connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard; > > > > —————————————————————————————— > > > > That AdvancedFT_Discard policy will ignore the exception from the > > insertion and keep ingesting. > > > > 2. Ingesting the data by a very simple socked adapter which reads the > > record one by one from an adm file. The src is here: > > > https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java > > The data and the app package is provided here: > > > https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing > > To feed the data you can run: > > > > ./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm > > > > -u for sever url > > -p for server port > > -c for count of line you want to ingest > > > > 3. After ingestion, all the requests about the ds_test was hanging. There > > is no exception and no responds for hours. However it can respond any > other > > queries that on other datasets, like Metadata. > > > > That data contains some duplicated records which should trigger the > insert > > exception. If I change the count from 5000000 to lower, let’s say > 3000000, > > it has no problems, although it contains duplications as well. > > > > Any feed experts have any hint on which part could be wrong? cc and nc > log > > was attached. Thank you! > > > > > > > > > > > > > > Best, > > > > Jianfeng Jia > > PhD Candidate of Computer Science > > University of California, Irvine > > > > > > >
