Re: Time of Multiple Joins in AsterixDB

Yingyi Bu Tue, 20 Dec 2016 22:21:43 -0800

Hi Mingda,

     I think that in your setting, a better configuration for AsterixDB
might be to use 64 partitions, i.e., 4 cores *16.
     1. To achieve that, you have to have 4 iodevices on each NC, e.g.:
         iodevices=/home/clash/asterixStorage/asterixdb5/red16
         -->
         iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-4


     2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G.  That means you'd
better have 512MB memory budget for each joiner so as to make the join
memory-resident.
         To achieve that,  in the cc section in the cc.conf, you could add:
         compiler.joinmemory=536870912

     3. For the JVM setting, 1024MB is too small for the NC.
         In the shared NC section in cc.conf, you can add:
         jvm.args=-Xmx16G

     4. For Pig and Hive, you can set the maximum mapper/reducer numbers in
the MapReduce configuration, e.g., at most 4 mappers per machine and at
most 4 reducers per machine.

     5. I'm not super-familiar with hyper threading, but it might be worth
trying 8 partitions per machine, i.e., 128 partitions in total.

     To validate if the new settings work, you can go to the admin/cluster
page to double check.
     Pls keep us updated and let us know if you run into any issue.
     Thanks!

Best,
Yingyi


On Tue, Dec 20, 2016 at 9:26 PM, mingda li <[email protected]> wrote:

> Dear Yingyi,
> 1. For the returned of  :http://<master node>:19002/admin/cluster
>
> {
>     "cc": {
>         "configUri": "http://scai01.cs.ucla.edu:
> 19002/admin/cluster/cc/config",
>         "statsUri": "http://scai01.cs.ucla.edu:
> 19002/admin/cluster/cc/stats",
>         "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump";
>     },
>     "config": {
>         "api.port": 19002,
>         "cc.java.opts": "-Xmx1024m",
>         "cluster.partitions": {
>             "0": "ID:0, Original Node: red16, IODevice: 0, Active Node:
> red16",
>             "1": "ID:1, Original Node: red15, IODevice: 0, Active Node:
> red15",
>             "2": "ID:2, Original Node: red14, IODevice: 0, Active Node:
> red14",
>             "3": "ID:3, Original Node: red13, IODevice: 0, Active Node:
> red13",
>             "4": "ID:4, Original Node: red12, IODevice: 0, Active Node:
> red12",
>             "5": "ID:5, Original Node: red11, IODevice: 0, Active Node:
> red11",
>             "6": "ID:6, Original Node: red10, IODevice: 0, Active Node:
> red10",
>             "7": "ID:7, Original Node: red9, IODevice: 0, Active Node:
> red9",
>             "8": "ID:8, Original Node: red8, IODevice: 0, Active Node:
> red8",
>             "9": "ID:9, Original Node: red7, IODevice: 0, Active Node:
> red7",
>             "10": "ID:10, Original Node: red6, IODevice: 0, Active Node:
> red6",
>             "11": "ID:11, Original Node: red5, IODevice: 0, Active Node:
> red5",
>             "12": "ID:12, Original Node: red4, IODevice: 0, Active Node:
> red4",
>             "13": "ID:13, Original Node: red3, IODevice: 0, Active Node:
> red3",
>             "14": "ID:14, Original Node: red2, IODevice: 0, Active Node:
> red2",
>             "15": "ID:15, Original Node: red, IODevice: 0, Active Node:
> red"
>         },
>         "compiler.framesize": 32768,
>         "compiler.groupmemory": 33554432,
>         "compiler.joinmemory": 33554432,
>         "compiler.pregelix.home": "~/pregelix",
>         "compiler.sortmemory": 33554432,
>         "core.dump.paths": {
>             "red": "/home/clash/asterixStorage/asterixdb5/red/coredump",
>             "red10": "/home/clash/asterixStorage/
> asterixdb5/red10/coredump",
>             "red11": "/home/clash/asterixStorage/
> asterixdb5/red11/coredump",
>             "red12": "/home/clash/asterixStorage/
> asterixdb5/red12/coredump",
>             "red13": "/home/clash/asterixStorage/
> asterixdb5/red13/coredump",
>             "red14": "/home/clash/asterixStorage/
> asterixdb5/red14/coredump",
>             "red15": "/home/clash/asterixStorage/
> asterixdb5/red15/coredump",
>             "red16": "/home/clash/asterixStorage/
> asterixdb5/red16/coredump",
>             "red2": "/home/clash/asterixStorage/asterixdb5/red2/coredump",
>             "red3": "/home/clash/asterixStorage/asterixdb5/red3/coredump",
>             "red4": "/home/clash/asterixStorage/asterixdb5/red4/coredump",
>             "red5": "/home/clash/asterixStorage/asterixdb5/red5/coredump",
>             "red6": "/home/clash/asterixStorage/asterixdb5/red6/coredump",
>             "red7": "/home/clash/asterixStorage/asterixdb5/red7/coredump",
>             "red8": "/home/clash/asterixStorage/asterixdb5/red8/coredump",
>             "red9": "/home/clash/asterixStorage/asterixdb5/red9/coredump"
>         },
>         "feed.central.manager.port": 4500,
>         "feed.max.threshold.period": 5,
>         "feed.memory.available.wait.timeout": 10,
>         "feed.memory.global.budget": 67108864,
>         "feed.pending.work.threshold": 50,
>         "feed.port": 19003,
>         "instance.name": "DEFAULT_INSTANCE",
>         "log.level": "WARNING",
>         "max.wait.active.cluster": 60,
>         "metadata.callback.port": 0,
>         "metadata.node": "red16",
>         "metadata.partition": "ID:0, Original Node: red16, IODevice:
> 0, Active Node: red16",
>         "metadata.port": 0,
>         "metadata.registration.timeout.secs": 60,
>         "nc.java.opts": "-Xmx1024m",
>         "node.partitions": {
>             "red": ["ID:15, Original Node: red, IODevice: 0, Active Node:
> red"],
>             "red10": ["ID:6, Original Node: red10, IODevice: 0, Active
> Node: red10"],
>             "red11": ["ID:5, Original Node: red11, IODevice: 0, Active
> Node: red11"],
>             "red12": ["ID:4, Original Node: red12, IODevice: 0, Active
> Node: red12"],
>             "red13": ["ID:3, Original Node: red13, IODevice: 0, Active
> Node: red13"],
>             "red14": ["ID:2, Original Node: red14, IODevice: 0, Active
> Node: red14"],
>             "red15": ["ID:1, Original Node: red15, IODevice: 0, Active
> Node: red15"],
>             "red16": ["ID:0, Original Node: red16, IODevice: 0, Active
> Node: red16"],
>             "red2": ["ID:14, Original Node: red2, IODevice: 0, Active
> Node: red2"],
>             "red3": ["ID:13, Original Node: red3, IODevice: 0, Active
> Node: red3"],
>             "red4": ["ID:12, Original Node: red4, IODevice: 0, Active
> Node: red4"],
>             "red5": ["ID:11, Original Node: red5, IODevice: 0, Active
> Node: red5"],
>             "red6": ["ID:10, Original Node: red6, IODevice: 0, Active
> Node: red6"],
>             "red7": ["ID:9, Original Node: red7, IODevice: 0, Active
> Node: red7"],
>             "red8": ["ID:8, Original Node: red8, IODevice: 0, Active
> Node: red8"],
>             "red9": ["ID:7, Original Node: red9, IODevice: 0, Active
> Node: red9"]
>         },
>         "node.stores": {
>             "red": ["/home/clash/asterixStorage/asterixdb5/red/storage"],
>             "red10": ["/home/clash/asterixStorage/
> asterixdb5/red10/storage"],
>             "red11": ["/home/clash/asterixStorage/
> asterixdb5/red11/storage"],
>             "red12": ["/home/clash/asterixStorage/
> asterixdb5/red12/storage"],
>             "red13": ["/home/clash/asterixStorage/
> asterixdb5/red13/storage"],
>             "red14": ["/home/clash/asterixStorage/
> asterixdb5/red14/storage"],
>             "red15": ["/home/clash/asterixStorage/
> asterixdb5/red15/storage"],
>             "red16": ["/home/clash/asterixStorage/
> asterixdb5/red16/storage"],
>             "red2": ["/home/clash/asterixStorage/
> asterixdb5/red2/storage"],
>             "red3": ["/home/clash/asterixStorage/
> asterixdb5/red3/storage"],
>             "red4": ["/home/clash/asterixStorage/
> asterixdb5/red4/storage"],
>             "red5": ["/home/clash/asterixStorage/
> asterixdb5/red5/storage"],
>             "red6": ["/home/clash/asterixStorage/
> asterixdb5/red6/storage"],
>             "red7": ["/home/clash/asterixStorage/
> asterixdb5/red7/storage"],
>             "red8": ["/home/clash/asterixStorage/
> asterixdb5/red8/storage"],
>             "red9": ["/home/clash/asterixStorage/asterixdb5/red9/storage"]
>         },
>         "plot.activate": false,
>         "replication.enabled": false,
>         "replication.factor": 2,
>         "replication.log.batchsize": 4096,
>         "replication.log.buffer.numpages": 8,
>         "replication.log.buffer.pagesize": 131072,
>         "replication.max.remote.recovery.attempts": 5,
>         "replication.timeout": 30,
>         "storage.buffercache.maxopenfiles": 2147483647,
>         "storage.buffercache.pagesize": 131072,
>         "storage.buffercache.size": 536870912,
>         "storage.lsm.bloomfilter.falsepositiverate": 0.01,
>         "storage.memorycomponent.globalbudget": 536870912,
>         "storage.memorycomponent.numcomponents": 2,
>         "storage.memorycomponent.numpages": 256,
>         "storage.memorycomponent.pagesize": 131072,
>         "storage.metadata.memorycomponent.numpages": 256,
>         "transaction.log.dirs": {
>             "red": "/home/clash/asterixStorage/asterixdb5/red/txnlog",
>             "red10": "/home/clash/asterixStorage/asterixdb5/red10/txnlog",
>             "red11": "/home/clash/asterixStorage/asterixdb5/red11/txnlog",
>             "red12": "/home/clash/asterixStorage/asterixdb5/red12/txnlog",
>             "red13": "/home/clash/asterixStorage/asterixdb5/red13/txnlog",
>             "red14": "/home/clash/asterixStorage/asterixdb5/red14/txnlog",
>             "red15": "/home/clash/asterixStorage/asterixdb5/red15/txnlog",
>             "red16": "/home/clash/asterixStorage/asterixdb5/red16/txnlog",
>             "red2": "/home/clash/asterixStorage/asterixdb5/red2/txnlog",
>             "red3": "/home/clash/asterixStorage/asterixdb5/red3/txnlog",
>             "red4": "/home/clash/asterixStorage/asterixdb5/red4/txnlog",
>             "red5": "/home/clash/asterixStorage/asterixdb5/red5/txnlog",
>             "red6": "/home/clash/asterixStorage/asterixdb5/red6/txnlog",
>             "red7": "/home/clash/asterixStorage/asterixdb5/red7/txnlog",
>             "red8": "/home/clash/asterixStorage/asterixdb5/red8/txnlog",
>             "red9": "/home/clash/asterixStorage/asterixdb5/red9/txnlog"
>         },
>         "txn.commitprofiler.reportinterval": 5,
>         "txn.job.recovery.memorysize": 67108864,
>         "txn.lock.escalationthreshold": 1000,
>         "txn.lock.shrinktimer": 5000,
>         "txn.lock.timeout.sweepthreshold": 10000,
>         "txn.lock.timeout.waitthreshold": 60000,
>         "txn.log.buffer.numpages": 8,
>         "txn.log.buffer.pagesize": 131072,
>         "txn.log.checkpoint.history": 0,
>         "txn.log.checkpoint.lsnthreshold": 67108864,
>         "txn.log.checkpoint.pollfrequency": 120,
>         "txn.log.partitionsize": 268435456,
>         "web.port": 19001,
>         "web.queryinterface.port": 19006,
>         "web.secondary.port": 19005
>     },
>     "fullShutdownUri":
> "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true";,
>     "metadata_node": "red16",
>     "ncs": [
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config";,
>             "node_id": "red15",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_1"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config";,
>             "node_id": "red14",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_2"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config";,
>             "node_id": "red16",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_0"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config";,
>             "node_id": "red11",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_5"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config";,
>             "node_id": "red10",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_6"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config";,
>             "node_id": "red13",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_3"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config";,
>             "node_id": "red12",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_4"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config";,
>             "node_id": "red6",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_10"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config";,
>             "node_id": "red",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_15"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config";,
>             "node_id": "red5",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_11"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config";,
>             "node_id": "red8",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_8"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config";,
>             "node_id": "red7",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_9"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config";,
>             "node_id": "red2",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_14"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config";,
>             "node_id": "red4",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_12"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config";,
>             "node_id": "red3",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_13"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/threaddump";
>         },
>         {
>             "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config";,
>             "node_id": "red9",
>             "partitions": [{
>                 "active": true,
>                 "partition_id": "partition_7"
>             }],
>             "state": "ACTIVE",
>             "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats";,
>             "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/threaddump";
>         }
>     ],
>     "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown";,
>     "state": "ACTIVE",
>     "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version";
> }
>
> 2.Catalog_return:2.28G
>
> catalog_sales:31.01G
>
> inventory:8.63G
>
> 3.As for Pig and Hive, I always use the default configuration. I didn't set
> the partition things for them. And for Spark, we use 200 partitions, which
> may be improved and just not bad. For AsterixDB, I also set the cluster
> using default value of partition and JVM things (I didn't manually set
> these parameters).
>
>
>
> On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <[email protected]> wrote:
>
> > Mingda,
> >
> >      1. Can you paste the returned JSON of http://<master
> > node>:19002/admin/cluster at your side? (Pls replace <master node> with
> the
> > actual master node name or IP)
> >      2. Can you list the individual size of each dataset involved in the
> > query, e.g., catalog_returns, catalog_sales, and inventory?  (I assume
> > 100GB is the overall size?)
> >      3. Do Spark/Hive/Pig saturate all CPUs on all machines, i.e., how
> many
> > partitions are running on each machine?  (It seems that your AsterixDB
> > configuration wouldn't saturate all CPUs for queries --- in the current
> > AsterixDB master, the computation parallelism is set to be the same as
> the
> > storage parallelism (i.e., the number of iodevices on each NC). I've
> > submitted a new patch that allow flexible computation parallelism, which
> > should be able to get merged into master very soon.)
> >      Thanks!
> >
> > Best,
> > Yingyi
> >
> > On Tue, Dec 20, 2016 at 5:44 PM, mingda li <[email protected]>
> wrote:
> >
> > > Oh, sure. When we test the 100G multiple join, we find AsterixDB is
> > slower
> > > than Spark (but still faster than Pig and Hive).
> > > I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We will
> > > only use 1-10G.eps in our paper).
> > > And thanks for Ian's advice:* The dev list generally strips
> attachments.
> > > Maybe you can just put the config inline? Or link to a pastebin/gist?*
> > > I know why you can't see the attachments. So I move the plots with two
> > > documents to my Dropbox.
> > > You can find the
> > > 1-10G.eps here: https://www.dropbox.com/s/
> rk3xg6gigsfcuyq/1-10G.eps?dl=0
> > > 1-100G.eps here:https://www.dropbox.com/s/tyxnmt6ehau2ski/1-100G.eps?
> > dl=0
> > > cc_conf.pdf here: https://www.dropbox.com/s/
> y3of1s17qdstv5f/cc_conf.pdf?
> > > dl=0
> > > CompleteQuery.pdf here:
> > > https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl=0
> > >
> > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <[email protected]>
> > > wrote:
> > >
> > > > Mingda: Please also share the numbers for 100GB, which show AsterixDB
> > not
> > > > quite doing as well as Spark. These 100GB results will not be in our
> > > > submission version, since they’re not needed for the desired message:
> > > > picking the right join order matters. Nevertheless, I’d like to get a
> > > > better understanding of what’s going on in the larger dataset regime.
> > > >
> > > >
> > > >
> > > > -Tyson
> > > >
> > > >
> > > >
> > > > From: Yingyi Bu [mailto:[email protected]]
> > > > Sent: Tuesday, December 20, 2016 4:30 PM
> > > > To: [email protected]
> > > > Cc: Michael Carey <[email protected]>; Tyson Condie <
> > > > [email protected]>
> > > > Subject: Re: Time of Multiple Joins in AsterixDB
> > > >
> > > >
> > > >
> > > > Hi Mingda,
> > > >
> > > >
> > > >
> > > >      It looks that you didn't attach the pdf?
> > > >
> > > >      Thanks!
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Yingyi
> > > >
> > > >
> > > >
> > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li <[email protected]
> > > > <mailto:[email protected]> > wrote:
> > > >
> > > > Sorry for the wrong version of cc.conf. I convert it to pdf version
> as
> > > > attachment.
> > > >
> > > >
> > > >
> > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li <[email protected]
> > > > <mailto:[email protected]> > wrote:
> > > >
> > > > Dear all,
> > > >
> > > >
> > > >
> > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig)
> multiple
> > > > joins to see if there is a big difference with different join order.
> > This
> > > > is the reason for our research on multiple join and the result will
> > > apppear
> > > > in our paper which is to be submitted to VLDB soon. Could you help us
> > to
> > > > make sure that the test results make sense for AsterixDB?
> > > >
> > > >
> > > >
> > > > We configure the AsterixDB 0.8.9 ( use asterix-server-0.8.9-SNAPSHOT-
> > > binary-assembly)
> > > > in our cluster of 16 machines, each with a 3.40GHz i7 processor (4
> > cores
> > > > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk capacity.
> > The
> > > > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During
> > > > configuration, I follow the NCService instruction here
> > > > https://ci.apache.org/projects/asterixdb/ncservice.html. And I set
> the
> > > > cc.conf as in attachment. (Each node work as nc and the first node
> also
> > > > work as cc).
> > > >
> > > >
> > > >
> > > > For experiment, we use 3 fact tables from TPC-DS: inventory;
> > > > catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g.
> The
> > > > multiple join query we use in AsterixDB are as following:
> > > >
> > > >
> > > >
> > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales
> cs1
> > > > JOIN catalog_returns cr1
> > > >
> > > >  ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
> > > > cr1.cr_item_sk))  m1 JOIN inventory i1 ON i1.inv_item_sk =
> > > cs1.cs_item_sk;
> > > >
> > > >
> > > >
> > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1
> > > JOIN
> > > > inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN
> > catalog_returns
> > > > cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk
> =
> > > > cr1.cr_item_sk);
> > > >
> > > >
> > > >
> > > > We load the data to AsterixDB firstly and run the two different
> > queries.
> > > > (The complete version of all queries for AsterixDB is in attachment)
> > We
> > > > assume the data has already been stored in AsterixDB and only count
> the
> > > > time for multiple join.
> > > >
> > > >
> > > >
> > > > Meanwhile, we use the same dataset and query to test Spark, Pig and
> > Hive.
> > > > The result is shown in the attachment's figure. And you can find
> > > > AsterixDB's time is always better than others  no matter good or bad
> > > > order:-) (BTW, the y scale of figure is time in log scale. You can
> see
> > > the
> > > > time by the label of each bar.)
> > > >
> > > >
> > > >
> > > > Thanks for your help.
> > > >
> > > >
> > > >
> > > > Bests,
> > > >
> > > > Mingda
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Time of Multiple Joins in AsterixDB

Reply via email to