Cool, thanks, Mingda! Look forward to the new numbers! Best, Yingyi
On Wed, Dec 21, 2016 at 7:13 PM, mingda li <[email protected]> wrote: > Dear Yingyi, > > Thanks for your suggestion. I have reset the AsterixDB and retest AsterixDB > using new environment on 100G, 10G. The efficiency for all the tests (good > and bad order) have been all improved to twice speed. > I will finish all the tests and update the result later. > > Bests, > Mingda > > On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu <[email protected]> wrote: > > > Hi Mingda, > > > > I think that in your setting, a better configuration for AsterixDB > > might be to use 64 partitions, i.e., 4 cores *16. > > 1. To achieve that, you have to have 4 iodevices on each NC, e.g.: > > iodevices=/home/clash/asterixStorage/asterixdb5/red16 > > --> > > iodevices=/home/clash/asterixStorage/asterixdb5/red16-1, > > iodevices=/home/clash/asterixStorage/asterixdb5/red16-2, > > iodevices=/home/clash/asterixStorage/asterixdb5/red16-3, > > iodevices=/home/clash/asterixStorage/asterixdb5/red16-4 > > > > 2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G. That means > you'd > > better have 512MB memory budget for each joiner so as to make the join > > memory-resident. > > To achieve that, in the cc section in the cc.conf, you could > add: > > compiler.joinmemory=536870912 > > > > 3. For the JVM setting, 1024MB is too small for the NC. > > In the shared NC section in cc.conf, you can add: > > jvm.args=-Xmx16G > > > > 4. For Pig and Hive, you can set the maximum mapper/reducer numbers > in > > the MapReduce configuration, e.g., at most 4 mappers per machine and at > > most 4 reducers per machine. > > > > 5. I'm not super-familiar with hyper threading, but it might be > worth > > trying 8 partitions per machine, i.e., 128 partitions in total. > > > > To validate if the new settings work, you can go to the > admin/cluster > > page to double check. > > Pls keep us updated and let us know if you run into any issue. > > Thanks! > > > > Best, > > Yingyi > > > > > > On Tue, Dec 20, 2016 at 9:26 PM, mingda li <[email protected]> > wrote: > > > > > Dear Yingyi, > > > 1. For the returned of :http://<master node>:19002/admin/cluster > > > > > > { > > > "cc": { > > > "configUri": "http://scai01.cs.ucla.edu: > > > 19002/admin/cluster/cc/config", > > > "statsUri": "http://scai01.cs.ucla.edu: > > > 19002/admin/cluster/cc/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump" > > > }, > > > "config": { > > > "api.port": 19002, > > > "cc.java.opts": "-Xmx1024m", > > > "cluster.partitions": { > > > "0": "ID:0, Original Node: red16, IODevice: 0, Active Node: > > > red16", > > > "1": "ID:1, Original Node: red15, IODevice: 0, Active Node: > > > red15", > > > "2": "ID:2, Original Node: red14, IODevice: 0, Active Node: > > > red14", > > > "3": "ID:3, Original Node: red13, IODevice: 0, Active Node: > > > red13", > > > "4": "ID:4, Original Node: red12, IODevice: 0, Active Node: > > > red12", > > > "5": "ID:5, Original Node: red11, IODevice: 0, Active Node: > > > red11", > > > "6": "ID:6, Original Node: red10, IODevice: 0, Active Node: > > > red10", > > > "7": "ID:7, Original Node: red9, IODevice: 0, Active Node: > > > red9", > > > "8": "ID:8, Original Node: red8, IODevice: 0, Active Node: > > > red8", > > > "9": "ID:9, Original Node: red7, IODevice: 0, Active Node: > > > red7", > > > "10": "ID:10, Original Node: red6, IODevice: 0, Active > Node: > > > red6", > > > "11": "ID:11, Original Node: red5, IODevice: 0, Active > Node: > > > red5", > > > "12": "ID:12, Original Node: red4, IODevice: 0, Active > Node: > > > red4", > > > "13": "ID:13, Original Node: red3, IODevice: 0, Active > Node: > > > red3", > > > "14": "ID:14, Original Node: red2, IODevice: 0, Active > Node: > > > red2", > > > "15": "ID:15, Original Node: red, IODevice: 0, Active Node: > > > red" > > > }, > > > "compiler.framesize": 32768, > > > "compiler.groupmemory": 33554432, > > > "compiler.joinmemory": 33554432, > > > "compiler.pregelix.home": "~/pregelix", > > > "compiler.sortmemory": 33554432, > > > "core.dump.paths": { > > > "red": "/home/clash/asterixStorage/ > asterixdb5/red/coredump", > > > "red10": "/home/clash/asterixStorage/ > > > asterixdb5/red10/coredump", > > > "red11": "/home/clash/asterixStorage/ > > > asterixdb5/red11/coredump", > > > "red12": "/home/clash/asterixStorage/ > > > asterixdb5/red12/coredump", > > > "red13": "/home/clash/asterixStorage/ > > > asterixdb5/red13/coredump", > > > "red14": "/home/clash/asterixStorage/ > > > asterixdb5/red14/coredump", > > > "red15": "/home/clash/asterixStorage/ > > > asterixdb5/red15/coredump", > > > "red16": "/home/clash/asterixStorage/ > > > asterixdb5/red16/coredump", > > > "red2": "/home/clash/asterixStorage/ > > asterixdb5/red2/coredump", > > > "red3": "/home/clash/asterixStorage/ > > asterixdb5/red3/coredump", > > > "red4": "/home/clash/asterixStorage/ > > asterixdb5/red4/coredump", > > > "red5": "/home/clash/asterixStorage/ > > asterixdb5/red5/coredump", > > > "red6": "/home/clash/asterixStorage/ > > asterixdb5/red6/coredump", > > > "red7": "/home/clash/asterixStorage/ > > asterixdb5/red7/coredump", > > > "red8": "/home/clash/asterixStorage/ > > asterixdb5/red8/coredump", > > > "red9": "/home/clash/asterixStorage/ > > asterixdb5/red9/coredump" > > > }, > > > "feed.central.manager.port": 4500, > > > "feed.max.threshold.period": 5, > > > "feed.memory.available.wait.timeout": 10, > > > "feed.memory.global.budget": 67108864, > > > "feed.pending.work.threshold": 50, > > > "feed.port": 19003, > > > "instance.name": "DEFAULT_INSTANCE", > > > "log.level": "WARNING", > > > "max.wait.active.cluster": 60, > > > "metadata.callback.port": 0, > > > "metadata.node": "red16", > > > "metadata.partition": "ID:0, Original Node: red16, IODevice: > > > 0, Active Node: red16", > > > "metadata.port": 0, > > > "metadata.registration.timeout.secs": 60, > > > "nc.java.opts": "-Xmx1024m", > > > "node.partitions": { > > > "red": ["ID:15, Original Node: red, IODevice: 0, Active > Node: > > > red"], > > > "red10": ["ID:6, Original Node: red10, IODevice: 0, Active > > > Node: red10"], > > > "red11": ["ID:5, Original Node: red11, IODevice: 0, Active > > > Node: red11"], > > > "red12": ["ID:4, Original Node: red12, IODevice: 0, Active > > > Node: red12"], > > > "red13": ["ID:3, Original Node: red13, IODevice: 0, Active > > > Node: red13"], > > > "red14": ["ID:2, Original Node: red14, IODevice: 0, Active > > > Node: red14"], > > > "red15": ["ID:1, Original Node: red15, IODevice: 0, Active > > > Node: red15"], > > > "red16": ["ID:0, Original Node: red16, IODevice: 0, Active > > > Node: red16"], > > > "red2": ["ID:14, Original Node: red2, IODevice: 0, Active > > > Node: red2"], > > > "red3": ["ID:13, Original Node: red3, IODevice: 0, Active > > > Node: red3"], > > > "red4": ["ID:12, Original Node: red4, IODevice: 0, Active > > > Node: red4"], > > > "red5": ["ID:11, Original Node: red5, IODevice: 0, Active > > > Node: red5"], > > > "red6": ["ID:10, Original Node: red6, IODevice: 0, Active > > > Node: red6"], > > > "red7": ["ID:9, Original Node: red7, IODevice: 0, Active > > > Node: red7"], > > > "red8": ["ID:8, Original Node: red8, IODevice: 0, Active > > > Node: red8"], > > > "red9": ["ID:7, Original Node: red9, IODevice: 0, Active > > > Node: red9"] > > > }, > > > "node.stores": { > > > "red": ["/home/clash/asterixStorage/ > > asterixdb5/red/storage"], > > > "red10": ["/home/clash/asterixStorage/ > > > asterixdb5/red10/storage"], > > > "red11": ["/home/clash/asterixStorage/ > > > asterixdb5/red11/storage"], > > > "red12": ["/home/clash/asterixStorage/ > > > asterixdb5/red12/storage"], > > > "red13": ["/home/clash/asterixStorage/ > > > asterixdb5/red13/storage"], > > > "red14": ["/home/clash/asterixStorage/ > > > asterixdb5/red14/storage"], > > > "red15": ["/home/clash/asterixStorage/ > > > asterixdb5/red15/storage"], > > > "red16": ["/home/clash/asterixStorage/ > > > asterixdb5/red16/storage"], > > > "red2": ["/home/clash/asterixStorage/ > > > asterixdb5/red2/storage"], > > > "red3": ["/home/clash/asterixStorage/ > > > asterixdb5/red3/storage"], > > > "red4": ["/home/clash/asterixStorage/ > > > asterixdb5/red4/storage"], > > > "red5": ["/home/clash/asterixStorage/ > > > asterixdb5/red5/storage"], > > > "red6": ["/home/clash/asterixStorage/ > > > asterixdb5/red6/storage"], > > > "red7": ["/home/clash/asterixStorage/ > > > asterixdb5/red7/storage"], > > > "red8": ["/home/clash/asterixStorage/ > > > asterixdb5/red8/storage"], > > > "red9": ["/home/clash/asterixStorage/ > > asterixdb5/red9/storage"] > > > }, > > > "plot.activate": false, > > > "replication.enabled": false, > > > "replication.factor": 2, > > > "replication.log.batchsize": 4096, > > > "replication.log.buffer.numpages": 8, > > > "replication.log.buffer.pagesize": 131072, > > > "replication.max.remote.recovery.attempts": 5, > > > "replication.timeout": 30, > > > "storage.buffercache.maxopenfiles": 2147483647, > > > "storage.buffercache.pagesize": 131072, > > > "storage.buffercache.size": 536870912, > > > "storage.lsm.bloomfilter.falsepositiverate": 0.01, > > > "storage.memorycomponent.globalbudget": 536870912, > > > "storage.memorycomponent.numcomponents": 2, > > > "storage.memorycomponent.numpages": 256, > > > "storage.memorycomponent.pagesize": 131072, > > > "storage.metadata.memorycomponent.numpages": 256, > > > "transaction.log.dirs": { > > > "red": "/home/clash/asterixStorage/asterixdb5/red/txnlog", > > > "red10": "/home/clash/asterixStorage/ > > asterixdb5/red10/txnlog", > > > "red11": "/home/clash/asterixStorage/ > > asterixdb5/red11/txnlog", > > > "red12": "/home/clash/asterixStorage/ > > asterixdb5/red12/txnlog", > > > "red13": "/home/clash/asterixStorage/ > > asterixdb5/red13/txnlog", > > > "red14": "/home/clash/asterixStorage/ > > asterixdb5/red14/txnlog", > > > "red15": "/home/clash/asterixStorage/ > > asterixdb5/red15/txnlog", > > > "red16": "/home/clash/asterixStorage/ > > asterixdb5/red16/txnlog", > > > "red2": "/home/clash/asterixStorage/ > asterixdb5/red2/txnlog", > > > "red3": "/home/clash/asterixStorage/ > asterixdb5/red3/txnlog", > > > "red4": "/home/clash/asterixStorage/ > asterixdb5/red4/txnlog", > > > "red5": "/home/clash/asterixStorage/ > asterixdb5/red5/txnlog", > > > "red6": "/home/clash/asterixStorage/ > asterixdb5/red6/txnlog", > > > "red7": "/home/clash/asterixStorage/ > asterixdb5/red7/txnlog", > > > "red8": "/home/clash/asterixStorage/ > asterixdb5/red8/txnlog", > > > "red9": "/home/clash/asterixStorage/ > asterixdb5/red9/txnlog" > > > }, > > > "txn.commitprofiler.reportinterval": 5, > > > "txn.job.recovery.memorysize": 67108864, > > > "txn.lock.escalationthreshold": 1000, > > > "txn.lock.shrinktimer": 5000, > > > "txn.lock.timeout.sweepthreshold": 10000, > > > "txn.lock.timeout.waitthreshold": 60000, > > > "txn.log.buffer.numpages": 8, > > > "txn.log.buffer.pagesize": 131072, > > > "txn.log.checkpoint.history": 0, > > > "txn.log.checkpoint.lsnthreshold": 67108864, > > > "txn.log.checkpoint.pollfrequency": 120, > > > "txn.log.partitionsize": 268435456, > > > "web.port": 19001, > > > "web.queryinterface.port": 19006, > > > "web.secondary.port": 19005 > > > }, > > > "fullShutdownUri": > > > "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true", > > > "metadata_node": "red16", > > > "ncs": [ > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config", > > > "node_id": "red15", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_1" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config", > > > "node_id": "red14", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_2" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config", > > > "node_id": "red16", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_0" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config", > > > "node_id": "red11", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_5" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config", > > > "node_id": "red10", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_6" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config", > > > "node_id": "red13", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_3" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config", > > > "node_id": "red12", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_4" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config", > > > "node_id": "red6", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_10" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config", > > > "node_id": "red", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_15" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config", > > > "node_id": "red5", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_11" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config", > > > "node_id": "red8", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_8" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config", > > > "node_id": "red7", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_9" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config", > > > "node_id": "red2", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_14" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config", > > > "node_id": "red4", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_12" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config", > > > "node_id": "red3", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_13" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/threaddump" > > > }, > > > { > > > "configUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config", > > > "node_id": "red9", > > > "partitions": [{ > > > "active": true, > > > "partition_id": "partition_7" > > > }], > > > "state": "ACTIVE", > > > "statsUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats", > > > "threadDumpUri": > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/threaddump" > > > } > > > ], > > > "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown", > > > "state": "ACTIVE", > > > "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version" > > > } > > > > > > 2.Catalog_return:2.28G > > > > > > catalog_sales:31.01G > > > > > > inventory:8.63G > > > > > > 3.As for Pig and Hive, I always use the default configuration. I didn't > > set > > > the partition things for them. And for Spark, we use 200 partitions, > > which > > > may be improved and just not bad. For AsterixDB, I also set the cluster > > > using default value of partition and JVM things (I didn't manually set > > > these parameters). > > > > > > > > > > > > On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <[email protected]> wrote: > > > > > > > Mingda, > > > > > > > > 1. Can you paste the returned JSON of http://<master > > > > node>:19002/admin/cluster at your side? (Pls replace <master node> > with > > > the > > > > actual master node name or IP) > > > > 2. Can you list the individual size of each dataset involved in > > the > > > > query, e.g., catalog_returns, catalog_sales, and inventory? (I > assume > > > > 100GB is the overall size?) > > > > 3. Do Spark/Hive/Pig saturate all CPUs on all machines, i.e., > how > > > many > > > > partitions are running on each machine? (It seems that your > AsterixDB > > > > configuration wouldn't saturate all CPUs for queries --- in the > current > > > > AsterixDB master, the computation parallelism is set to be the same > as > > > the > > > > storage parallelism (i.e., the number of iodevices on each NC). I've > > > > submitted a new patch that allow flexible computation parallelism, > > which > > > > should be able to get merged into master very soon.) > > > > Thanks! > > > > > > > > Best, > > > > Yingyi > > > > > > > > On Tue, Dec 20, 2016 at 5:44 PM, mingda li <[email protected]> > > > wrote: > > > > > > > > > Oh, sure. When we test the 100G multiple join, we find AsterixDB is > > > > slower > > > > > than Spark (but still faster than Pig and Hive). > > > > > I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We > > will > > > > > only use 1-10G.eps in our paper). > > > > > And thanks for Ian's advice:* The dev list generally strips > > > attachments. > > > > > Maybe you can just put the config inline? Or link to a > > pastebin/gist?* > > > > > I know why you can't see the attachments. So I move the plots with > > two > > > > > documents to my Dropbox. > > > > > You can find the > > > > > 1-10G.eps here: https://www.dropbox.com/s/ > > > rk3xg6gigsfcuyq/1-10G.eps?dl=0 > > > > > 1-100G.eps here:https://www.dropbox.com/ > s/tyxnmt6ehau2ski/1-100G.eps > > ? > > > > dl=0 > > > > > cc_conf.pdf here: https://www.dropbox.com/s/ > > > y3of1s17qdstv5f/cc_conf.pdf? > > > > > dl=0 > > > > > CompleteQuery.pdf here: > > > > > https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl=0 > > > > > > > > > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Mingda: Please also share the numbers for 100GB, which show > > AsterixDB > > > > not > > > > > > quite doing as well as Spark. These 100GB results will not be in > > our > > > > > > submission version, since they’re not needed for the desired > > message: > > > > > > picking the right join order matters. Nevertheless, I’d like to > > get a > > > > > > better understanding of what’s going on in the larger dataset > > regime. > > > > > > > > > > > > > > > > > > > > > > > > -Tyson > > > > > > > > > > > > > > > > > > > > > > > > From: Yingyi Bu [mailto:[email protected]] > > > > > > Sent: Tuesday, December 20, 2016 4:30 PM > > > > > > To: [email protected] > > > > > > Cc: Michael Carey <[email protected]>; Tyson Condie < > > > > > > [email protected]> > > > > > > Subject: Re: Time of Multiple Joins in AsterixDB > > > > > > > > > > > > > > > > > > > > > > > > Hi Mingda, > > > > > > > > > > > > > > > > > > > > > > > > It looks that you didn't attach the pdf? > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Yingyi > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li < > [email protected] > > > > > > <mailto:[email protected]> > wrote: > > > > > > > > > > > > Sorry for the wrong version of cc.conf. I convert it to pdf > version > > > as > > > > > > attachment. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li < > [email protected] > > > > > > <mailto:[email protected]> > wrote: > > > > > > > > > > > > Dear all, > > > > > > > > > > > > > > > > > > > > > > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig) > > > multiple > > > > > > joins to see if there is a big difference with different join > > order. > > > > This > > > > > > is the reason for our research on multiple join and the result > will > > > > > apppear > > > > > > in our paper which is to be submitted to VLDB soon. Could you > help > > us > > > > to > > > > > > make sure that the test results make sense for AsterixDB? > > > > > > > > > > > > > > > > > > > > > > > > We configure the AsterixDB 0.8.9 ( use > > asterix-server-0.8.9-SNAPSHOT- > > > > > binary-assembly) > > > > > > in our cluster of 16 machines, each with a 3.40GHz i7 processor > (4 > > > > cores > > > > > > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk > > capacity. > > > > The > > > > > > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. > During > > > > > > configuration, I follow the NCService instruction here > > > > > > https://ci.apache.org/projects/asterixdb/ncservice.html. And I > set > > > the > > > > > > cc.conf as in attachment. (Each node work as nc and the first > node > > > also > > > > > > work as cc). > > > > > > > > > > > > > > > > > > > > > > > > For experiment, we use 3 fact tables from TPC-DS: inventory; > > > > > > catalog_sales; catalog_returns with TPC-DS scale factor 1g and > 10g. > > > The > > > > > > multiple join query we use in AsterixDB are as following: > > > > > > > > > > > > > > > > > > > > > > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM > catalog_sales > > > cs1 > > > > > > JOIN catalog_returns cr1 > > > > > > > > > > > > ON (cs1.cs_order_number = cr1.cr_order_number AND > cs1.cs_item_sk = > > > > > > cr1.cr_item_sk)) m1 JOIN inventory i1 ON i1.inv_item_sk = > > > > > cs1.cs_item_sk; > > > > > > > > > > > > > > > > > > > > > > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales > > cs1 > > > > > JOIN > > > > > > inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN > > > > catalog_returns > > > > > > cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND > > cs1.cs_item_sk > > > = > > > > > > cr1.cr_item_sk); > > > > > > > > > > > > > > > > > > > > > > > > We load the data to AsterixDB firstly and run the two different > > > > queries. > > > > > > (The complete version of all queries for AsterixDB is in > > attachment) > > > > We > > > > > > assume the data has already been stored in AsterixDB and only > count > > > the > > > > > > time for multiple join. > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, we use the same dataset and query to test Spark, Pig > and > > > > Hive. > > > > > > The result is shown in the attachment's figure. And you can find > > > > > > AsterixDB's time is always better than others no matter good or > > bad > > > > > > order:-) (BTW, the y scale of figure is time in log scale. You > can > > > see > > > > > the > > > > > > time by the label of each bar.) > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your help. > > > > > > > > > > > > > > > > > > > > > > > > Bests, > > > > > > > > > > > > Mingda > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
