The dev list generally strips attachments. Maybe you can just put the config inline? Or link to a pastebin/gist?
On Tue, Dec 20, 2016 at 4:45 PM, mingda li <[email protected]> wrote: > Oh, I think you can find the figure in the first email. > I can attach the three files again here. (BaselineDifferentDB.eps; > cc_conf.pdf; > CompleteQuery.pdf) > Sorry for making any confusion. > > > On Tue, Dec 20, 2016 at 4:30 PM, Yingyi Bu <[email protected]> wrote: > >> Hi Mingda, >> >> It looks that you didn't attach the pdf? >> Thanks! >> >> Best, >> Yingyi >> >> On Tue, Dec 20, 2016 at 4:15 PM, mingda li <[email protected]> >> wrote: >> >> > Sorry for the wrong version of cc.conf. I convert it to pdf version as >> > attachment. >> > >> > On Tue, Dec 20, 2016 at 4:06 PM, mingda li <[email protected]> >> wrote: >> > >> >> Dear all, >> >> >> >> I am testing different systems' (AsterixDB, Spark, Hive, Pig) multiple >> >> joins to see if there is a big difference with different join order. >> This >> >> is the reason for our research on multiple join and the result will >> apppear >> >> in our paper which is to be submitted to VLDB soon. Could you help us >> to >> >> make sure that the test results make sense for AsterixDB? >> >> >> >> We configure the AsterixDB 0.8.9 ( use asterix-server-0.8.9-SNAPSHOT- >> binary-assembly) >> >> in our cluster of 16 machines, each with a 3.40GHz i7 processor (4 >> cores >> >> and 2 hyper-threads per core), 32GB of RAM and 1TB of disk capacity. >> The >> >> operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During >> >> configuration, I follow the NCService instruction here >> >> https://ci.apache.org/projects/asterixdb/ncservice.html. And I set the >> >> cc.conf as in attachment. (Each node work as nc and the first node also >> >> work as cc). >> >> >> >> For experiment, we use 3 fact tables from TPC-DS: inventory; >> >> catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g. The >> >> multiple join query we use in AsterixDB are as following: >> >> >> >> Good Join Order: *SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 >> >> JOIN catalog_returns cr1* >> >> * ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk = >> >> cr1.cr_item_sk)) m1 JOIN inventory i1 ON i1.inv_item_sk = >> cs1.cs_item_sk;* >> >> >> >> Bad Join Order: *SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 >> >> JOIN inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN >> >> catalog_returns cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND >> >> cs1.cs_item_sk = cr1.cr_item_sk);* >> >> >> >> We load the data to AsterixDB firstly and run the two different >> queries. >> >> (The complete version of all queries for AsterixDB is in attachment) >> We >> >> assume the data has already been stored in AsterixDB and only count the >> >> time for multiple join. >> >> >> >> Meanwhile, we use the same dataset and query to test Spark, Pig and >> Hive. >> >> The result is shown in the attachment's figure. And you can find >> >> AsterixDB's time is always better than others no matter good or bad >> >> order:-) (BTW, the y scale of figure is time in log scale. You can see >> the >> >> time by the label of each bar.) >> >> >> >> Thanks for your help. >> >> >> >> Bests, >> >> Mingda >> >> >> >> >> >> >> > >> > >
