Oh, sure. When we test the 100G multiple join, we find AsterixDB is slower than Spark (but still faster than Pig and Hive). I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We will only use 1-10G.eps in our paper). And thanks for Ian's advice:* The dev list generally strips attachments. Maybe you can just put the config inline? Or link to a pastebin/gist?* I know why you can't see the attachments. So I move the plots with two documents to my Dropbox. You can find the 1-10G.eps here: https://www.dropbox.com/s/rk3xg6gigsfcuyq/1-10G.eps?dl=0 1-100G.eps here:https://www.dropbox.com/s/tyxnmt6ehau2ski/1-100G.eps?dl=0 cc_conf.pdf here: https://www.dropbox.com/s/y3of1s17qdstv5f/cc_conf.pdf?dl=0 CompleteQuery.pdf here: https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl=0
On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <[email protected]> wrote: > Mingda: Please also share the numbers for 100GB, which show AsterixDB not > quite doing as well as Spark. These 100GB results will not be in our > submission version, since they’re not needed for the desired message: > picking the right join order matters. Nevertheless, I’d like to get a > better understanding of what’s going on in the larger dataset regime. > > > > -Tyson > > > > From: Yingyi Bu [mailto:[email protected]] > Sent: Tuesday, December 20, 2016 4:30 PM > To: [email protected] > Cc: Michael Carey <[email protected]>; Tyson Condie < > [email protected]> > Subject: Re: Time of Multiple Joins in AsterixDB > > > > Hi Mingda, > > > > It looks that you didn't attach the pdf? > > Thanks! > > > > Best, > > Yingyi > > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li <[email protected] > <mailto:[email protected]> > wrote: > > Sorry for the wrong version of cc.conf. I convert it to pdf version as > attachment. > > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li <[email protected] > <mailto:[email protected]> > wrote: > > Dear all, > > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig) multiple > joins to see if there is a big difference with different join order. This > is the reason for our research on multiple join and the result will apppear > in our paper which is to be submitted to VLDB soon. Could you help us to > make sure that the test results make sense for AsterixDB? > > > > We configure the AsterixDB 0.8.9 ( use > asterix-server-0.8.9-SNAPSHOT-binary-assembly) > in our cluster of 16 machines, each with a 3.40GHz i7 processor (4 cores > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk capacity. The > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During > configuration, I follow the NCService instruction here > https://ci.apache.org/projects/asterixdb/ncservice.html. And I set the > cc.conf as in attachment. (Each node work as nc and the first node also > work as cc). > > > > For experiment, we use 3 fact tables from TPC-DS: inventory; > catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g. The > multiple join query we use in AsterixDB are as following: > > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 > JOIN catalog_returns cr1 > > ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk = > cr1.cr_item_sk)) m1 JOIN inventory i1 ON i1.inv_item_sk = cs1.cs_item_sk; > > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 JOIN > inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN catalog_returns > cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk = > cr1.cr_item_sk); > > > > We load the data to AsterixDB firstly and run the two different queries. > (The complete version of all queries for AsterixDB is in attachment) We > assume the data has already been stored in AsterixDB and only count the > time for multiple join. > > > > Meanwhile, we use the same dataset and query to test Spark, Pig and Hive. > The result is shown in the attachment's figure. And you can find > AsterixDB's time is always better than others no matter good or bad > order:-) (BTW, the y scale of figure is time in log scale. You can see the > time by the label of each bar.) > > > > Thanks for your help. > > > > Bests, > > Mingda > > > > > > > > > >
