No need to reload data anymore :-) Best, Yingyi
On Thu, Dec 22, 2016 at 11:36 AM, Yingyi Bu <[email protected]> wrote: > Indeed, the change was merged yesterday. > If you grab the latest master, the computation parallelism can be set by > the parameter compiler.parallelism: > -- 0, the default, means to use the storage parallelism as computation > parallelism > -- negative value, means to use all available cores in the cluster > -- positive value, means to that number of cores, but it will fall to back > to use all available cores if the number is too large. > > Best, > Yingyi > > > On Thu, Dec 22, 2016 at 10:59 AM, Mike Carey <[email protected]> wrote: > >> It would definitely also be fun to see how the newest more flexible >> parallelism stuff handles this! (Where you'd specify storage parallelism >> based on drives, and compute parallelism based on cores, both spread across >> all of the cluster's resources.) >> >> >> On 12/22/16 10:57 AM, Yingyi Bu wrote: >> >>> Mingda, >>> >>> >>> There is one more things that you can do without the need to reload >>> your data: >>> >>> -- Enlarge the buffercache memory budget to 8GB so that the >>> datasets >>> can fit into the buffer cache: >>> "storage.buffercache.size": 536870912 >>> -> >>> "storage.buffercache.size": 8589934592 >>> >>> You don't need to reload data but only need to restart the >>> AsterixDB >>> instance. >>> Thanks! >>> >>> Best, >>> Yingyi >>> >>> >>> >>> On Wed, Dec 21, 2016 at 9:22 PM, Mike Carey <[email protected]> wrote: >>> >>> Nice!! >>>> >>>> On Dec 21, 2016 8:43 PM, "Yingyi Bu" <[email protected]> wrote: >>>> >>>> Cool, thanks, Mingda! >>>>> Look forward to the new numbers! >>>>> >>>>> Best, >>>>> Yingyi >>>>> >>>>> On Wed, Dec 21, 2016 at 7:13 PM, mingda li <[email protected]> >>>>> >>>> wrote: >>>> >>>>> Dear Yingyi, >>>>>> >>>>>> Thanks for your suggestion. I have reset the AsterixDB and retest >>>>>> >>>>> AsterixDB >>>>> >>>>>> using new environment on 100G, 10G. The efficiency for all the tests >>>>>> >>>>> (good >>>>> >>>>>> and bad order) have been all improved to twice speed. >>>>>> I will finish all the tests and update the result later. >>>>>> >>>>>> Bests, >>>>>> Mingda >>>>>> >>>>>> On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu <[email protected]> >>>>>> >>>>> wrote: >>>> >>>>> Hi Mingda, >>>>>>> >>>>>>> I think that in your setting, a better configuration for >>>>>>> >>>>>> AsterixDB >>>> >>>>> might be to use 64 partitions, i.e., 4 cores *16. >>>>>>> 1. To achieve that, you have to have 4 iodevices on each NC, >>>>>>> >>>>>> e.g.: >>>> >>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16 >>>>>>> --> >>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-1, >>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-2, >>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-3, >>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-4 >>>>>>> >>>>>>> 2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G. That >>>>>>> means >>>>>>> >>>>>> you'd >>>>>> >>>>>>> better have 512MB memory budget for each joiner so as to make the >>>>>>> >>>>>> join >>>> >>>>> memory-resident. >>>>>>> To achieve that, in the cc section in the cc.conf, you >>>>>>> >>>>>> could >>>> >>>>> add: >>>>>> >>>>>>> compiler.joinmemory=536870912 >>>>>>> >>>>>>> 3. For the JVM setting, 1024MB is too small for the NC. >>>>>>> In the shared NC section in cc.conf, you can add: >>>>>>> jvm.args=-Xmx16G >>>>>>> >>>>>>> 4. For Pig and Hive, you can set the maximum mapper/reducer >>>>>>> >>>>>> numbers >>>>> >>>>>> in >>>>>> >>>>>>> the MapReduce configuration, e.g., at most 4 mappers per machine and >>>>>>> >>>>>> at >>>> >>>>> most 4 reducers per machine. >>>>>>> >>>>>>> 5. I'm not super-familiar with hyper threading, but it might be >>>>>>> >>>>>> worth >>>>>> >>>>>>> trying 8 partitions per machine, i.e., 128 partitions in total. >>>>>>> >>>>>>> To validate if the new settings work, you can go to the >>>>>>> >>>>>> admin/cluster >>>>>> >>>>>>> page to double check. >>>>>>> Pls keep us updated and let us know if you run into any issue. >>>>>>> Thanks! >>>>>>> >>>>>>> Best, >>>>>>> Yingyi >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 20, 2016 at 9:26 PM, mingda li <[email protected]> >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> Dear Yingyi, >>>>>>>> 1. For the returned of :http://<master node>:19002/admin/cluster >>>>>>>> >>>>>>>> { >>>>>>>> "cc": { >>>>>>>> "configUri": "http://scai01.cs.ucla.edu: >>>>>>>> 19002/admin/cluster/cc/config", >>>>>>>> "statsUri": "http://scai01.cs.ucla.edu: >>>>>>>> 19002/admin/cluster/cc/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump" >>>>>>>> }, >>>>>>>> "config": { >>>>>>>> "api.port": 19002, >>>>>>>> "cc.java.opts": "-Xmx1024m", >>>>>>>> "cluster.partitions": { >>>>>>>> "0": "ID:0, Original Node: red16, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red16", >>>>>>>> "1": "ID:1, Original Node: red15, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red15", >>>>>>>> "2": "ID:2, Original Node: red14, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red14", >>>>>>>> "3": "ID:3, Original Node: red13, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red13", >>>>>>>> "4": "ID:4, Original Node: red12, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red12", >>>>>>>> "5": "ID:5, Original Node: red11, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red11", >>>>>>>> "6": "ID:6, Original Node: red10, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red10", >>>>>>>> "7": "ID:7, Original Node: red9, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red9", >>>>>>>> "8": "ID:8, Original Node: red8, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red8", >>>>>>>> "9": "ID:9, Original Node: red7, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red7", >>>>>>>> "10": "ID:10, Original Node: red6, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>>> >>>>>>> red6", >>>>>>>> "11": "ID:11, Original Node: red5, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>>> >>>>>>> red5", >>>>>>>> "12": "ID:12, Original Node: red4, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>>> >>>>>>> red4", >>>>>>>> "13": "ID:13, Original Node: red3, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>>> >>>>>>> red3", >>>>>>>> "14": "ID:14, Original Node: red2, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>>> >>>>>>> red2", >>>>>>>> "15": "ID:15, Original Node: red, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>> >>>>>> red" >>>>>>>> }, >>>>>>>> "compiler.framesize": 32768, >>>>>>>> "compiler.groupmemory": 33554432, >>>>>>>> "compiler.joinmemory": 33554432, >>>>>>>> "compiler.pregelix.home": "~/pregelix", >>>>>>>> "compiler.sortmemory": 33554432, >>>>>>>> "core.dump.paths": { >>>>>>>> "red": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red/coredump", >>>>>> >>>>>>> "red10": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red10/coredump", >>>>>>>> "red11": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red11/coredump", >>>>>>>> "red12": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red12/coredump", >>>>>>>> "red13": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red13/coredump", >>>>>>>> "red14": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red14/coredump", >>>>>>>> "red15": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red15/coredump", >>>>>>>> "red16": "/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red16/coredump", >>>>>>>> "red2": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red2/coredump", >>>>>>> >>>>>>>> "red3": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red3/coredump", >>>>>>> >>>>>>>> "red4": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red4/coredump", >>>>>>> >>>>>>>> "red5": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red5/coredump", >>>>>>> >>>>>>>> "red6": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red6/coredump", >>>>>>> >>>>>>>> "red7": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red7/coredump", >>>>>>> >>>>>>>> "red8": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red8/coredump", >>>>>>> >>>>>>>> "red9": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red9/coredump" >>>>>>> >>>>>>>> }, >>>>>>>> "feed.central.manager.port": 4500, >>>>>>>> "feed.max.threshold.period": 5, >>>>>>>> "feed.memory.available.wait.timeout": 10, >>>>>>>> "feed.memory.global.budget": 67108864, >>>>>>>> "feed.pending.work.threshold": 50, >>>>>>>> "feed.port": 19003, >>>>>>>> "instance.name": "DEFAULT_INSTANCE", >>>>>>>> "log.level": "WARNING", >>>>>>>> "max.wait.active.cluster": 60, >>>>>>>> "metadata.callback.port": 0, >>>>>>>> "metadata.node": "red16", >>>>>>>> "metadata.partition": "ID:0, Original Node: red16, >>>>>>>> >>>>>>> IODevice: >>>> >>>>> 0, Active Node: red16", >>>>>>>> "metadata.port": 0, >>>>>>>> "metadata.registration.timeout.secs": 60, >>>>>>>> "nc.java.opts": "-Xmx1024m", >>>>>>>> "node.partitions": { >>>>>>>> "red": ["ID:15, Original Node: red, IODevice: 0, Active >>>>>>>> >>>>>>> Node: >>>>>> >>>>>>> red"], >>>>>>>> "red10": ["ID:6, Original Node: red10, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red10"], >>>>>>>> "red11": ["ID:5, Original Node: red11, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red11"], >>>>>>>> "red12": ["ID:4, Original Node: red12, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red12"], >>>>>>>> "red13": ["ID:3, Original Node: red13, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red13"], >>>>>>>> "red14": ["ID:2, Original Node: red14, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red14"], >>>>>>>> "red15": ["ID:1, Original Node: red15, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red15"], >>>>>>>> "red16": ["ID:0, Original Node: red16, IODevice: 0, >>>>>>>> >>>>>>> Active >>>>> >>>>>> Node: red16"], >>>>>>>> "red2": ["ID:14, Original Node: red2, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red2"], >>>>>>>> "red3": ["ID:13, Original Node: red3, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red3"], >>>>>>>> "red4": ["ID:12, Original Node: red4, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red4"], >>>>>>>> "red5": ["ID:11, Original Node: red5, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red5"], >>>>>>>> "red6": ["ID:10, Original Node: red6, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red6"], >>>>>>>> "red7": ["ID:9, Original Node: red7, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red7"], >>>>>>>> "red8": ["ID:8, Original Node: red8, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red8"], >>>>>>>> "red9": ["ID:7, Original Node: red9, IODevice: 0, >>>>>>>> >>>>>>> Active >>>> >>>>> Node: red9"] >>>>>>>> }, >>>>>>>> "node.stores": { >>>>>>>> "red": ["/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red/storage"], >>>>>>> >>>>>>>> "red10": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red10/storage"], >>>>>>>> "red11": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red11/storage"], >>>>>>>> "red12": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red12/storage"], >>>>>>>> "red13": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red13/storage"], >>>>>>>> "red14": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red14/storage"], >>>>>>>> "red15": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red15/storage"], >>>>>>>> "red16": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red16/storage"], >>>>>>>> "red2": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red2/storage"], >>>>>>>> "red3": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red3/storage"], >>>>>>>> "red4": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red4/storage"], >>>>>>>> "red5": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red5/storage"], >>>>>>>> "red6": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red6/storage"], >>>>>>>> "red7": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red7/storage"], >>>>>>>> "red8": ["/home/clash/asterixStorage/ >>>>>>>> asterixdb5/red8/storage"], >>>>>>>> "red9": ["/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red9/storage"] >>>>>>> >>>>>>>> }, >>>>>>>> "plot.activate": false, >>>>>>>> "replication.enabled": false, >>>>>>>> "replication.factor": 2, >>>>>>>> "replication.log.batchsize": 4096, >>>>>>>> "replication.log.buffer.numpages": 8, >>>>>>>> "replication.log.buffer.pagesize": 131072, >>>>>>>> "replication.max.remote.recovery.attempts": 5, >>>>>>>> "replication.timeout": 30, >>>>>>>> "storage.buffercache.maxopenfiles": 2147483647, >>>>>>>> "storage.buffercache.pagesize": 131072, >>>>>>>> "storage.buffercache.size": 536870912, >>>>>>>> "storage.lsm.bloomfilter.falsepositiverate": 0.01, >>>>>>>> "storage.memorycomponent.globalbudget": 536870912, >>>>>>>> "storage.memorycomponent.numcomponents": 2, >>>>>>>> "storage.memorycomponent.numpages": 256, >>>>>>>> "storage.memorycomponent.pagesize": 131072, >>>>>>>> "storage.metadata.memorycomponent.numpages": 256, >>>>>>>> "transaction.log.dirs": { >>>>>>>> "red": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red/txnlog", >>>>> >>>>>> "red10": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red10/txnlog", >>>>>>> >>>>>>>> "red11": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red11/txnlog", >>>>>>> >>>>>>>> "red12": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red12/txnlog", >>>>>>> >>>>>>>> "red13": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red13/txnlog", >>>>>>> >>>>>>>> "red14": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red14/txnlog", >>>>>>> >>>>>>>> "red15": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red15/txnlog", >>>>>>> >>>>>>>> "red16": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red16/txnlog", >>>>>>> >>>>>>>> "red2": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red2/txnlog", >>>>>> >>>>>>> "red3": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red3/txnlog", >>>>>> >>>>>>> "red4": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red4/txnlog", >>>>>> >>>>>>> "red5": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red5/txnlog", >>>>>> >>>>>>> "red6": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red6/txnlog", >>>>>> >>>>>>> "red7": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red7/txnlog", >>>>>> >>>>>>> "red8": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red8/txnlog", >>>>>> >>>>>>> "red9": "/home/clash/asterixStorage/ >>>>>>>> >>>>>>> asterixdb5/red9/txnlog" >>>>>> >>>>>>> }, >>>>>>>> "txn.commitprofiler.reportinterval": 5, >>>>>>>> "txn.job.recovery.memorysize": 67108864, >>>>>>>> "txn.lock.escalationthreshold": 1000, >>>>>>>> "txn.lock.shrinktimer": 5000, >>>>>>>> "txn.lock.timeout.sweepthreshold": 10000, >>>>>>>> "txn.lock.timeout.waitthreshold": 60000, >>>>>>>> "txn.log.buffer.numpages": 8, >>>>>>>> "txn.log.buffer.pagesize": 131072, >>>>>>>> "txn.log.checkpoint.history": 0, >>>>>>>> "txn.log.checkpoint.lsnthreshold": 67108864, >>>>>>>> "txn.log.checkpoint.pollfrequency": 120, >>>>>>>> "txn.log.partitionsize": 268435456, >>>>>>>> "web.port": 19001, >>>>>>>> "web.queryinterface.port": 19006, >>>>>>>> "web.secondary.port": 19005 >>>>>>>> }, >>>>>>>> "fullShutdownUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true", >>>>>>>> "metadata_node": "red16", >>>>>>>> "ncs": [ >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config", >>>>>>>> "node_id": "red15", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_1" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red15/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config", >>>>>>>> "node_id": "red14", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_2" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red14/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config", >>>>>>>> "node_id": "red16", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_0" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red16/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config", >>>>>>>> "node_id": "red11", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_5" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red11/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config", >>>>>>>> "node_id": "red10", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_6" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red10/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config", >>>>>>>> "node_id": "red13", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_3" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red13/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config", >>>>>>>> "node_id": "red12", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_4" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/ >>>>>>>> >>>>>>> red12/threaddump >>>> >>>>> " >>>>> >>>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config", >>>>>>>> "node_id": "red6", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_10" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config", >>>>>>>> "node_id": "red", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_15" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump >>>>>>>> >>>>>>> " >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config", >>>>>>>> "node_id": "red5", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_11" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config", >>>>>>>> "node_id": "red8", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_8" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config", >>>>>>>> "node_id": "red7", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_9" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config", >>>>>>>> "node_id": "red2", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_14" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config", >>>>>>>> "node_id": "red4", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_12" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config", >>>>>>>> "node_id": "red3", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_13" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> }, >>>>>>>> { >>>>>>>> "configUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config", >>>>>>>> "node_id": "red9", >>>>>>>> "partitions": [{ >>>>>>>> "active": true, >>>>>>>> "partition_id": "partition_7" >>>>>>>> }], >>>>>>>> "state": "ACTIVE", >>>>>>>> "statsUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats", >>>>>>>> "threadDumpUri": >>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/ >>>>>>>> >>>>>>> threaddump" >>>> >>>>> } >>>>>>>> ], >>>>>>>> "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown >>>>>>>> >>>>>>> ", >>>> >>>>> "state": "ACTIVE", >>>>>>>> "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version" >>>>>>>> } >>>>>>>> >>>>>>>> 2.Catalog_return:2.28G >>>>>>>> >>>>>>>> catalog_sales:31.01G >>>>>>>> >>>>>>>> inventory:8.63G >>>>>>>> >>>>>>>> 3.As for Pig and Hive, I always use the default configuration. I >>>>>>>> >>>>>>> didn't >>>>> >>>>>> set >>>>>>> >>>>>>>> the partition things for them. And for Spark, we use 200 >>>>>>>> >>>>>>> partitions, >>>> >>>>> which >>>>>>> >>>>>>>> may be improved and just not bad. For AsterixDB, I also set the >>>>>>>> >>>>>>> cluster >>>>> >>>>>> using default value of partition and JVM things (I didn't manually >>>>>>>> >>>>>>> set >>>>> >>>>>> these parameters). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <[email protected]> >>>>>>>> >>>>>>> wrote: >>>>> >>>>>> Mingda, >>>>>>>>> >>>>>>>>> 1. Can you paste the returned JSON of http://<master >>>>>>>>> node>:19002/admin/cluster at your side? (Pls replace <master >>>>>>>>> >>>>>>>> node> >>>> >>>>> with >>>>>> >>>>>>> the >>>>>>>> >>>>>>>>> actual master node name or IP) >>>>>>>>> 2. Can you list the individual size of each dataset involved >>>>>>>>> >>>>>>>> in >>>>> >>>>>> the >>>>>>> >>>>>>>> query, e.g., catalog_returns, catalog_sales, and inventory? (I >>>>>>>>> >>>>>>>> assume >>>>>> >>>>>>> 100GB is the overall size?) >>>>>>>>> 3. Do Spark/Hive/Pig saturate all CPUs on all machines, >>>>>>>>> >>>>>>>> i.e., >>>> >>>>> how >>>>>> >>>>>>> many >>>>>>>> >>>>>>>>> partitions are running on each machine? (It seems that your >>>>>>>>> >>>>>>>> AsterixDB >>>>>> >>>>>>> configuration wouldn't saturate all CPUs for queries --- in the >>>>>>>>> >>>>>>>> current >>>>>> >>>>>>> AsterixDB master, the computation parallelism is set to be the >>>>>>>>> >>>>>>>> same >>>> >>>>> as >>>>>> >>>>>>> the >>>>>>>> >>>>>>>>> storage parallelism (i.e., the number of iodevices on each NC). >>>>>>>>> >>>>>>>> I've >>>>> >>>>>> submitted a new patch that allow flexible computation >>>>>>>>> >>>>>>>> parallelism, >>>> >>>>> which >>>>>>> >>>>>>>> should be able to get merged into master very soon.) >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Yingyi >>>>>>>>> >>>>>>>>> On Tue, Dec 20, 2016 at 5:44 PM, mingda li < >>>>>>>>> >>>>>>>> [email protected] >>>> >>>>> wrote: >>>>>>>> >>>>>>>>> Oh, sure. When we test the 100G multiple join, we find >>>>>>>>>> >>>>>>>>> AsterixDB >>>> >>>>> is >>>>> >>>>>> slower >>>>>>>>> >>>>>>>>>> than Spark (but still faster than Pig and Hive). >>>>>>>>>> I can share with you the both plots: 1-10G.eps and 1-100G.eps. >>>>>>>>>> >>>>>>>>> (We >>>>> >>>>>> will >>>>>>> >>>>>>>> only use 1-10G.eps in our paper). >>>>>>>>>> And thanks for Ian's advice:* The dev list generally strips >>>>>>>>>> >>>>>>>>> attachments. >>>>>>>> >>>>>>>>> Maybe you can just put the config inline? Or link to a >>>>>>>>>> >>>>>>>>> pastebin/gist?* >>>>>>> >>>>>>>> I know why you can't see the attachments. So I move the plots >>>>>>>>>> >>>>>>>>> with >>>>> >>>>>> two >>>>>>> >>>>>>>> documents to my Dropbox. >>>>>>>>>> You can find the >>>>>>>>>> 1-10G.eps here: https://www.dropbox.com/s/ >>>>>>>>>> >>>>>>>>> rk3xg6gigsfcuyq/1-10G.eps?dl=0 >>>>>>>> >>>>>>>>> 1-100G.eps here:https://www.dropbox.com/ >>>>>>>>>> >>>>>>>>> s/tyxnmt6ehau2ski/1-100G.eps >>>>>> >>>>>>> ? >>>>>>> >>>>>>>> dl=0 >>>>>>>>> >>>>>>>>>> cc_conf.pdf here: https://www.dropbox.com/s/ >>>>>>>>>> >>>>>>>>> y3of1s17qdstv5f/cc_conf.pdf? >>>>>>>> >>>>>>>>> dl=0 >>>>>>>>>> CompleteQuery.pdf here: >>>>>>>>>> https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery. >>>>>>>>>> >>>>>>>>> pdf?dl=0 >>>> >>>>> On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie < >>>>>>>>>> >>>>>>>>> [email protected]> >>>>>>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Mingda: Please also share the numbers for 100GB, which show >>>>>>>>>>> >>>>>>>>>> AsterixDB >>>>>>> >>>>>>>> not >>>>>>>>> >>>>>>>>>> quite doing as well as Spark. These 100GB results will not be >>>>>>>>>>> >>>>>>>>>> in >>>>> >>>>>> our >>>>>>> >>>>>>>> submission version, since they’re not needed for the desired >>>>>>>>>>> >>>>>>>>>> message: >>>>>>> >>>>>>>> picking the right join order matters. Nevertheless, I’d like >>>>>>>>>>> >>>>>>>>>> to >>>> >>>>> get a >>>>>> >>>>>> >
