Hi Mingda,
I think that in your setting, a better configuration for AsterixDB
might be to use 64 partitions, i.e., 4 cores *16.
1. To achieve that, you have to have 4 iodevices on each NC, e.g.:
iodevices=/home/clash/asterixStorage/asterixdb5/red16
-->
iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-4
2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G. That means you'd
better have 512MB memory budget for each joiner so as to make the join
memory-resident.
To achieve that, in the cc section in the cc.conf, you could add:
compiler.joinmemory=536870912
3. For the JVM setting, 1024MB is too small for the NC.
In the shared NC section in cc.conf, you can add:
jvm.args=-Xmx16G
4. For Pig and Hive, you can set the maximum mapper/reducer numbers in
the MapReduce configuration, e.g., at most 4 mappers per machine and at
most 4 reducers per machine.
5. I'm not super-familiar with hyper threading, but it might be worth
trying 8 partitions per machine, i.e., 128 partitions in total.
To validate if the new settings work, you can go to the admin/cluster
page to double check.
Pls keep us updated and let us know if you run into any issue.
Thanks!
Best,
Yingyi
On Tue, Dec 20, 2016 at 9:26 PM, mingda li <[email protected]> wrote:
> Dear Yingyi,
> 1. For the returned of :http://<master node>:19002/admin/cluster
>
> {
> "cc": {
> "configUri": "http://scai01.cs.ucla.edu:
> 19002/admin/cluster/cc/config",
> "statsUri": "http://scai01.cs.ucla.edu:
> 19002/admin/cluster/cc/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump"
> },
> "config": {
> "api.port": 19002,
> "cc.java.opts": "-Xmx1024m",
> "cluster.partitions": {
> "0": "ID:0, Original Node: red16, IODevice: 0, Active Node:
> red16",
> "1": "ID:1, Original Node: red15, IODevice: 0, Active Node:
> red15",
> "2": "ID:2, Original Node: red14, IODevice: 0, Active Node:
> red14",
> "3": "ID:3, Original Node: red13, IODevice: 0, Active Node:
> red13",
> "4": "ID:4, Original Node: red12, IODevice: 0, Active Node:
> red12",
> "5": "ID:5, Original Node: red11, IODevice: 0, Active Node:
> red11",
> "6": "ID:6, Original Node: red10, IODevice: 0, Active Node:
> red10",
> "7": "ID:7, Original Node: red9, IODevice: 0, Active Node:
> red9",
> "8": "ID:8, Original Node: red8, IODevice: 0, Active Node:
> red8",
> "9": "ID:9, Original Node: red7, IODevice: 0, Active Node:
> red7",
> "10": "ID:10, Original Node: red6, IODevice: 0, Active Node:
> red6",
> "11": "ID:11, Original Node: red5, IODevice: 0, Active Node:
> red5",
> "12": "ID:12, Original Node: red4, IODevice: 0, Active Node:
> red4",
> "13": "ID:13, Original Node: red3, IODevice: 0, Active Node:
> red3",
> "14": "ID:14, Original Node: red2, IODevice: 0, Active Node:
> red2",
> "15": "ID:15, Original Node: red, IODevice: 0, Active Node:
> red"
> },
> "compiler.framesize": 32768,
> "compiler.groupmemory": 33554432,
> "compiler.joinmemory": 33554432,
> "compiler.pregelix.home": "~/pregelix",
> "compiler.sortmemory": 33554432,
> "core.dump.paths": {
> "red": "/home/clash/asterixStorage/asterixdb5/red/coredump",
> "red10": "/home/clash/asterixStorage/
> asterixdb5/red10/coredump",
> "red11": "/home/clash/asterixStorage/
> asterixdb5/red11/coredump",
> "red12": "/home/clash/asterixStorage/
> asterixdb5/red12/coredump",
> "red13": "/home/clash/asterixStorage/
> asterixdb5/red13/coredump",
> "red14": "/home/clash/asterixStorage/
> asterixdb5/red14/coredump",
> "red15": "/home/clash/asterixStorage/
> asterixdb5/red15/coredump",
> "red16": "/home/clash/asterixStorage/
> asterixdb5/red16/coredump",
> "red2": "/home/clash/asterixStorage/asterixdb5/red2/coredump",
> "red3": "/home/clash/asterixStorage/asterixdb5/red3/coredump",
> "red4": "/home/clash/asterixStorage/asterixdb5/red4/coredump",
> "red5": "/home/clash/asterixStorage/asterixdb5/red5/coredump",
> "red6": "/home/clash/asterixStorage/asterixdb5/red6/coredump",
> "red7": "/home/clash/asterixStorage/asterixdb5/red7/coredump",
> "red8": "/home/clash/asterixStorage/asterixdb5/red8/coredump",
> "red9": "/home/clash/asterixStorage/asterixdb5/red9/coredump"
> },
> "feed.central.manager.port": 4500,
> "feed.max.threshold.period": 5,
> "feed.memory.available.wait.timeout": 10,
> "feed.memory.global.budget": 67108864,
> "feed.pending.work.threshold": 50,
> "feed.port": 19003,
> "instance.name": "DEFAULT_INSTANCE",
> "log.level": "WARNING",
> "max.wait.active.cluster": 60,
> "metadata.callback.port": 0,
> "metadata.node": "red16",
> "metadata.partition": "ID:0, Original Node: red16, IODevice:
> 0, Active Node: red16",
> "metadata.port": 0,
> "metadata.registration.timeout.secs": 60,
> "nc.java.opts": "-Xmx1024m",
> "node.partitions": {
> "red": ["ID:15, Original Node: red, IODevice: 0, Active Node:
> red"],
> "red10": ["ID:6, Original Node: red10, IODevice: 0, Active
> Node: red10"],
> "red11": ["ID:5, Original Node: red11, IODevice: 0, Active
> Node: red11"],
> "red12": ["ID:4, Original Node: red12, IODevice: 0, Active
> Node: red12"],
> "red13": ["ID:3, Original Node: red13, IODevice: 0, Active
> Node: red13"],
> "red14": ["ID:2, Original Node: red14, IODevice: 0, Active
> Node: red14"],
> "red15": ["ID:1, Original Node: red15, IODevice: 0, Active
> Node: red15"],
> "red16": ["ID:0, Original Node: red16, IODevice: 0, Active
> Node: red16"],
> "red2": ["ID:14, Original Node: red2, IODevice: 0, Active
> Node: red2"],
> "red3": ["ID:13, Original Node: red3, IODevice: 0, Active
> Node: red3"],
> "red4": ["ID:12, Original Node: red4, IODevice: 0, Active
> Node: red4"],
> "red5": ["ID:11, Original Node: red5, IODevice: 0, Active
> Node: red5"],
> "red6": ["ID:10, Original Node: red6, IODevice: 0, Active
> Node: red6"],
> "red7": ["ID:9, Original Node: red7, IODevice: 0, Active
> Node: red7"],
> "red8": ["ID:8, Original Node: red8, IODevice: 0, Active
> Node: red8"],
> "red9": ["ID:7, Original Node: red9, IODevice: 0, Active
> Node: red9"]
> },
> "node.stores": {
> "red": ["/home/clash/asterixStorage/asterixdb5/red/storage"],
> "red10": ["/home/clash/asterixStorage/
> asterixdb5/red10/storage"],
> "red11": ["/home/clash/asterixStorage/
> asterixdb5/red11/storage"],
> "red12": ["/home/clash/asterixStorage/
> asterixdb5/red12/storage"],
> "red13": ["/home/clash/asterixStorage/
> asterixdb5/red13/storage"],
> "red14": ["/home/clash/asterixStorage/
> asterixdb5/red14/storage"],
> "red15": ["/home/clash/asterixStorage/
> asterixdb5/red15/storage"],
> "red16": ["/home/clash/asterixStorage/
> asterixdb5/red16/storage"],
> "red2": ["/home/clash/asterixStorage/
> asterixdb5/red2/storage"],
> "red3": ["/home/clash/asterixStorage/
> asterixdb5/red3/storage"],
> "red4": ["/home/clash/asterixStorage/
> asterixdb5/red4/storage"],
> "red5": ["/home/clash/asterixStorage/
> asterixdb5/red5/storage"],
> "red6": ["/home/clash/asterixStorage/
> asterixdb5/red6/storage"],
> "red7": ["/home/clash/asterixStorage/
> asterixdb5/red7/storage"],
> "red8": ["/home/clash/asterixStorage/
> asterixdb5/red8/storage"],
> "red9": ["/home/clash/asterixStorage/asterixdb5/red9/storage"]
> },
> "plot.activate": false,
> "replication.enabled": false,
> "replication.factor": 2,
> "replication.log.batchsize": 4096,
> "replication.log.buffer.numpages": 8,
> "replication.log.buffer.pagesize": 131072,
> "replication.max.remote.recovery.attempts": 5,
> "replication.timeout": 30,
> "storage.buffercache.maxopenfiles": 2147483647,
> "storage.buffercache.pagesize": 131072,
> "storage.buffercache.size": 536870912,
> "storage.lsm.bloomfilter.falsepositiverate": 0.01,
> "storage.memorycomponent.globalbudget": 536870912,
> "storage.memorycomponent.numcomponents": 2,
> "storage.memorycomponent.numpages": 256,
> "storage.memorycomponent.pagesize": 131072,
> "storage.metadata.memorycomponent.numpages": 256,
> "transaction.log.dirs": {
> "red": "/home/clash/asterixStorage/asterixdb5/red/txnlog",
> "red10": "/home/clash/asterixStorage/asterixdb5/red10/txnlog",
> "red11": "/home/clash/asterixStorage/asterixdb5/red11/txnlog",
> "red12": "/home/clash/asterixStorage/asterixdb5/red12/txnlog",
> "red13": "/home/clash/asterixStorage/asterixdb5/red13/txnlog",
> "red14": "/home/clash/asterixStorage/asterixdb5/red14/txnlog",
> "red15": "/home/clash/asterixStorage/asterixdb5/red15/txnlog",
> "red16": "/home/clash/asterixStorage/asterixdb5/red16/txnlog",
> "red2": "/home/clash/asterixStorage/asterixdb5/red2/txnlog",
> "red3": "/home/clash/asterixStorage/asterixdb5/red3/txnlog",
> "red4": "/home/clash/asterixStorage/asterixdb5/red4/txnlog",
> "red5": "/home/clash/asterixStorage/asterixdb5/red5/txnlog",
> "red6": "/home/clash/asterixStorage/asterixdb5/red6/txnlog",
> "red7": "/home/clash/asterixStorage/asterixdb5/red7/txnlog",
> "red8": "/home/clash/asterixStorage/asterixdb5/red8/txnlog",
> "red9": "/home/clash/asterixStorage/asterixdb5/red9/txnlog"
> },
> "txn.commitprofiler.reportinterval": 5,
> "txn.job.recovery.memorysize": 67108864,
> "txn.lock.escalationthreshold": 1000,
> "txn.lock.shrinktimer": 5000,
> "txn.lock.timeout.sweepthreshold": 10000,
> "txn.lock.timeout.waitthreshold": 60000,
> "txn.log.buffer.numpages": 8,
> "txn.log.buffer.pagesize": 131072,
> "txn.log.checkpoint.history": 0,
> "txn.log.checkpoint.lsnthreshold": 67108864,
> "txn.log.checkpoint.pollfrequency": 120,
> "txn.log.partitionsize": 268435456,
> "web.port": 19001,
> "web.queryinterface.port": 19006,
> "web.secondary.port": 19005
> },
> "fullShutdownUri":
> "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true",
> "metadata_node": "red16",
> "ncs": [
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config",
> "node_id": "red15",
> "partitions": [{
> "active": true,
> "partition_id": "partition_1"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config",
> "node_id": "red14",
> "partitions": [{
> "active": true,
> "partition_id": "partition_2"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config",
> "node_id": "red16",
> "partitions": [{
> "active": true,
> "partition_id": "partition_0"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config",
> "node_id": "red11",
> "partitions": [{
> "active": true,
> "partition_id": "partition_5"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config",
> "node_id": "red10",
> "partitions": [{
> "active": true,
> "partition_id": "partition_6"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config",
> "node_id": "red13",
> "partitions": [{
> "active": true,
> "partition_id": "partition_3"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config",
> "node_id": "red12",
> "partitions": [{
> "active": true,
> "partition_id": "partition_4"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config",
> "node_id": "red6",
> "partitions": [{
> "active": true,
> "partition_id": "partition_10"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config",
> "node_id": "red",
> "partitions": [{
> "active": true,
> "partition_id": "partition_15"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config",
> "node_id": "red5",
> "partitions": [{
> "active": true,
> "partition_id": "partition_11"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config",
> "node_id": "red8",
> "partitions": [{
> "active": true,
> "partition_id": "partition_8"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config",
> "node_id": "red7",
> "partitions": [{
> "active": true,
> "partition_id": "partition_9"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config",
> "node_id": "red2",
> "partitions": [{
> "active": true,
> "partition_id": "partition_14"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config",
> "node_id": "red4",
> "partitions": [{
> "active": true,
> "partition_id": "partition_12"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config",
> "node_id": "red3",
> "partitions": [{
> "active": true,
> "partition_id": "partition_13"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/threaddump"
> },
> {
> "configUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config",
> "node_id": "red9",
> "partitions": [{
> "active": true,
> "partition_id": "partition_7"
> }],
> "state": "ACTIVE",
> "statsUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats",
> "threadDumpUri":
> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/threaddump"
> }
> ],
> "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown",
> "state": "ACTIVE",
> "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version"
> }
>
> 2.Catalog_return:2.28G
>
> catalog_sales:31.01G
>
> inventory:8.63G
>
> 3.As for Pig and Hive, I always use the default configuration. I didn't set
> the partition things for them. And for Spark, we use 200 partitions, which
> may be improved and just not bad. For AsterixDB, I also set the cluster
> using default value of partition and JVM things (I didn't manually set
> these parameters).
>
>
>
> On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <[email protected]> wrote:
>
> > Mingda,
> >
> > 1. Can you paste the returned JSON of http://<master
> > node>:19002/admin/cluster at your side? (Pls replace <master node> with
> the
> > actual master node name or IP)
> > 2. Can you list the individual size of each dataset involved in the
> > query, e.g., catalog_returns, catalog_sales, and inventory? (I assume
> > 100GB is the overall size?)
> > 3. Do Spark/Hive/Pig saturate all CPUs on all machines, i.e., how
> many
> > partitions are running on each machine? (It seems that your AsterixDB
> > configuration wouldn't saturate all CPUs for queries --- in the current
> > AsterixDB master, the computation parallelism is set to be the same as
> the
> > storage parallelism (i.e., the number of iodevices on each NC). I've
> > submitted a new patch that allow flexible computation parallelism, which
> > should be able to get merged into master very soon.)
> > Thanks!
> >
> > Best,
> > Yingyi
> >
> > On Tue, Dec 20, 2016 at 5:44 PM, mingda li <[email protected]>
> wrote:
> >
> > > Oh, sure. When we test the 100G multiple join, we find AsterixDB is
> > slower
> > > than Spark (but still faster than Pig and Hive).
> > > I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We will
> > > only use 1-10G.eps in our paper).
> > > And thanks for Ian's advice:* The dev list generally strips
> attachments.
> > > Maybe you can just put the config inline? Or link to a pastebin/gist?*
> > > I know why you can't see the attachments. So I move the plots with two
> > > documents to my Dropbox.
> > > You can find the
> > > 1-10G.eps here: https://www.dropbox.com/s/
> rk3xg6gigsfcuyq/1-10G.eps?dl=0
> > > 1-100G.eps here:https://www.dropbox.com/s/tyxnmt6ehau2ski/1-100G.eps?
> > dl=0
> > > cc_conf.pdf here: https://www.dropbox.com/s/
> y3of1s17qdstv5f/cc_conf.pdf?
> > > dl=0
> > > CompleteQuery.pdf here:
> > > https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl=0
> > >
> > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <[email protected]>
> > > wrote:
> > >
> > > > Mingda: Please also share the numbers for 100GB, which show AsterixDB
> > not
> > > > quite doing as well as Spark. These 100GB results will not be in our
> > > > submission version, since they’re not needed for the desired message:
> > > > picking the right join order matters. Nevertheless, I’d like to get a
> > > > better understanding of what’s going on in the larger dataset regime.
> > > >
> > > >
> > > >
> > > > -Tyson
> > > >
> > > >
> > > >
> > > > From: Yingyi Bu [mailto:[email protected]]
> > > > Sent: Tuesday, December 20, 2016 4:30 PM
> > > > To: [email protected]
> > > > Cc: Michael Carey <[email protected]>; Tyson Condie <
> > > > [email protected]>
> > > > Subject: Re: Time of Multiple Joins in AsterixDB
> > > >
> > > >
> > > >
> > > > Hi Mingda,
> > > >
> > > >
> > > >
> > > > It looks that you didn't attach the pdf?
> > > >
> > > > Thanks!
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Yingyi
> > > >
> > > >
> > > >
> > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li <[email protected]
> > > > <mailto:[email protected]> > wrote:
> > > >
> > > > Sorry for the wrong version of cc.conf. I convert it to pdf version
> as
> > > > attachment.
> > > >
> > > >
> > > >
> > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li <[email protected]
> > > > <mailto:[email protected]> > wrote:
> > > >
> > > > Dear all,
> > > >
> > > >
> > > >
> > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig)
> multiple
> > > > joins to see if there is a big difference with different join order.
> > This
> > > > is the reason for our research on multiple join and the result will
> > > apppear
> > > > in our paper which is to be submitted to VLDB soon. Could you help us
> > to
> > > > make sure that the test results make sense for AsterixDB?
> > > >
> > > >
> > > >
> > > > We configure the AsterixDB 0.8.9 ( use asterix-server-0.8.9-SNAPSHOT-
> > > binary-assembly)
> > > > in our cluster of 16 machines, each with a 3.40GHz i7 processor (4
> > cores
> > > > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk capacity.
> > The
> > > > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During
> > > > configuration, I follow the NCService instruction here
> > > > https://ci.apache.org/projects/asterixdb/ncservice.html. And I set
> the
> > > > cc.conf as in attachment. (Each node work as nc and the first node
> also
> > > > work as cc).
> > > >
> > > >
> > > >
> > > > For experiment, we use 3 fact tables from TPC-DS: inventory;
> > > > catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g.
> The
> > > > multiple join query we use in AsterixDB are as following:
> > > >
> > > >
> > > >
> > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales
> cs1
> > > > JOIN catalog_returns cr1
> > > >
> > > > ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk =
> > > > cr1.cr_item_sk)) m1 JOIN inventory i1 ON i1.inv_item_sk =
> > > cs1.cs_item_sk;
> > > >
> > > >
> > > >
> > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1
> > > JOIN
> > > > inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN
> > catalog_returns
> > > > cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk
> =
> > > > cr1.cr_item_sk);
> > > >
> > > >
> > > >
> > > > We load the data to AsterixDB firstly and run the two different
> > queries.
> > > > (The complete version of all queries for AsterixDB is in attachment)
> > We
> > > > assume the data has already been stored in AsterixDB and only count
> the
> > > > time for multiple join.
> > > >
> > > >
> > > >
> > > > Meanwhile, we use the same dataset and query to test Spark, Pig and
> > Hive.
> > > > The result is shown in the attachment's figure. And you can find
> > > > AsterixDB's time is always better than others no matter good or bad
> > > > order:-) (BTW, the y scale of figure is time in log scale. You can
> see
> > > the
> > > > time by the label of each bar.)
> > > >
> > > >
> > > >
> > > > Thanks for your help.
> > > >
> > > >
> > > >
> > > > Bests,
> > > >
> > > > Mingda
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>