Re: Time of Multiple Joins in AsterixDB

Mike Carey Thu, 22 Dec 2016 11:00:07 -0800

It would definitely also be fun to see how the newest more flexibleparallelism stuff handles this! (Where you'd specify storageparallelism based on drives, and compute parallelism based on cores,both spread across all of the cluster's resources.)


On 12/22/16 10:57 AM, Yingyi Bu wrote:

Mingda,


      There is one more things that you can do without the need to reload
your data:

      -- Enlarge the buffercache memory budget to 8GB so that the datasets
can fit into the buffer cache:
          "storage.buffercache.size": 536870912
          ->
          "storage.buffercache.size": 8589934592

      You don't need to reload data but only need to restart the AsterixDB
instance.
      Thanks!

Best,
Yingyi



On Wed, Dec 21, 2016 at 9:22 PM, Mike Carey <[email protected]> wrote:

Nice!!

On Dec 21, 2016 8:43 PM, "Yingyi Bu" <[email protected]> wrote:

Cool, thanks, Mingda!
Look forward to the new numbers!

Best,
Yingyi

On Wed, Dec 21, 2016 at 7:13 PM, mingda li <[email protected]>

wrote:

Dear Yingyi,

Thanks for your suggestion. I have reset the AsterixDB and retest

AsterixDB

using new environment on 100G, 10G. The efficiency for all the tests

(good

and bad order) have been all improved to twice speed.
I will finish all the tests and update the result later.

Bests,
Mingda

On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu <[email protected]>

wrote:

Hi Mingda,

      I think that in your setting, a better configuration for

AsterixDB

might be to use 64 partitions, i.e., 4 cores *16.
      1. To achieve that, you have to have 4 iodevices on each NC,

e.g.:

          iodevices=/home/clash/asterixStorage/asterixdb5/red16
          -->
          iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
iodevices=/home/clash/asterixStorage/asterixdb5/red16-4

      2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G.  That means

you'd

better have 512MB memory budget for each joiner so as to make the

join

memory-resident.
          To achieve that,  in the cc section in the cc.conf, you

could

add:

          compiler.joinmemory=536870912

      3. For the JVM setting, 1024MB is too small for the NC.
          In the shared NC section in cc.conf, you can add:
          jvm.args=-Xmx16G

      4. For Pig and Hive, you can set the maximum mapper/reducer

numbers

in

the MapReduce configuration, e.g., at most 4 mappers per machine and

at

most 4 reducers per machine.

      5. I'm not super-familiar with hyper threading, but it might be

worth

trying 8 partitions per machine, i.e., 128 partitions in total.

      To validate if the new settings work, you can go to the

admin/cluster

page to double check.
      Pls keep us updated and let us know if you run into any issue.
      Thanks!

Best,
Yingyi


On Tue, Dec 20, 2016 at 9:26 PM, mingda li <[email protected]>

wrote:

Dear Yingyi,
1. For the returned of  :http://<master node>:19002/admin/cluster

{
     "cc": {
         "configUri": "http://scai01.cs.ucla.edu:
19002/admin/cluster/cc/config",
         "statsUri": "http://scai01.cs.ucla.edu:
19002/admin/cluster/cc/stats",
         "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump";
     },
     "config": {
         "api.port": 19002,
         "cc.java.opts": "-Xmx1024m",
         "cluster.partitions": {
             "0": "ID:0, Original Node: red16, IODevice: 0, Active

Node:

red16",
             "1": "ID:1, Original Node: red15, IODevice: 0, Active

Node:

red15",
             "2": "ID:2, Original Node: red14, IODevice: 0, Active

Node:

red14",
             "3": "ID:3, Original Node: red13, IODevice: 0, Active

Node:

red13",
             "4": "ID:4, Original Node: red12, IODevice: 0, Active

Node:

red12",
             "5": "ID:5, Original Node: red11, IODevice: 0, Active

Node:

red11",
             "6": "ID:6, Original Node: red10, IODevice: 0, Active

Node:

red10",
             "7": "ID:7, Original Node: red9, IODevice: 0, Active

Node:

red9",
             "8": "ID:8, Original Node: red8, IODevice: 0, Active

Node:

red8",
             "9": "ID:9, Original Node: red7, IODevice: 0, Active

Node:

red7",
             "10": "ID:10, Original Node: red6, IODevice: 0, Active

Node:

red6",
             "11": "ID:11, Original Node: red5, IODevice: 0, Active

Node:

red5",
             "12": "ID:12, Original Node: red4, IODevice: 0, Active

Node:

red4",
             "13": "ID:13, Original Node: red3, IODevice: 0, Active

Node:

red3",
             "14": "ID:14, Original Node: red2, IODevice: 0, Active

Node:

red2",
             "15": "ID:15, Original Node: red, IODevice: 0, Active

Node:

red"
         },
         "compiler.framesize": 32768,
         "compiler.groupmemory": 33554432,
         "compiler.joinmemory": 33554432,
         "compiler.pregelix.home": "~/pregelix",
         "compiler.sortmemory": 33554432,
         "core.dump.paths": {
             "red": "/home/clash/asterixStorage/

asterixdb5/red/coredump",

             "red10": "/home/clash/asterixStorage/
asterixdb5/red10/coredump",
             "red11": "/home/clash/asterixStorage/
asterixdb5/red11/coredump",
             "red12": "/home/clash/asterixStorage/
asterixdb5/red12/coredump",
             "red13": "/home/clash/asterixStorage/
asterixdb5/red13/coredump",
             "red14": "/home/clash/asterixStorage/
asterixdb5/red14/coredump",
             "red15": "/home/clash/asterixStorage/
asterixdb5/red15/coredump",
             "red16": "/home/clash/asterixStorage/
asterixdb5/red16/coredump",
             "red2": "/home/clash/asterixStorage/

asterixdb5/red2/coredump",

             "red3": "/home/clash/asterixStorage/

asterixdb5/red3/coredump",

             "red4": "/home/clash/asterixStorage/

asterixdb5/red4/coredump",

             "red5": "/home/clash/asterixStorage/

asterixdb5/red5/coredump",

             "red6": "/home/clash/asterixStorage/

asterixdb5/red6/coredump",

             "red7": "/home/clash/asterixStorage/

asterixdb5/red7/coredump",

             "red8": "/home/clash/asterixStorage/

asterixdb5/red8/coredump",

             "red9": "/home/clash/asterixStorage/

asterixdb5/red9/coredump"

         },
         "feed.central.manager.port": 4500,
         "feed.max.threshold.period": 5,
         "feed.memory.available.wait.timeout": 10,
         "feed.memory.global.budget": 67108864,
         "feed.pending.work.threshold": 50,
         "feed.port": 19003,
         "instance.name": "DEFAULT_INSTANCE",
         "log.level": "WARNING",
         "max.wait.active.cluster": 60,
         "metadata.callback.port": 0,
         "metadata.node": "red16",
         "metadata.partition": "ID:0, Original Node: red16,

IODevice:

0, Active Node: red16",
         "metadata.port": 0,
         "metadata.registration.timeout.secs": 60,
         "nc.java.opts": "-Xmx1024m",
         "node.partitions": {
             "red": ["ID:15, Original Node: red, IODevice: 0, Active

Node:

red"],
             "red10": ["ID:6, Original Node: red10, IODevice: 0,

Active

Node: red10"],
             "red11": ["ID:5, Original Node: red11, IODevice: 0,

Active

Node: red11"],
             "red12": ["ID:4, Original Node: red12, IODevice: 0,

Active

Node: red12"],
             "red13": ["ID:3, Original Node: red13, IODevice: 0,

Active

Node: red13"],
             "red14": ["ID:2, Original Node: red14, IODevice: 0,

Active

Node: red14"],
             "red15": ["ID:1, Original Node: red15, IODevice: 0,

Active

Node: red15"],
             "red16": ["ID:0, Original Node: red16, IODevice: 0,

Active

Node: red16"],
             "red2": ["ID:14, Original Node: red2, IODevice: 0,

Active

Node: red2"],
             "red3": ["ID:13, Original Node: red3, IODevice: 0,

Active

Node: red3"],
             "red4": ["ID:12, Original Node: red4, IODevice: 0,

Active

Node: red4"],
             "red5": ["ID:11, Original Node: red5, IODevice: 0,

Active

Node: red5"],
             "red6": ["ID:10, Original Node: red6, IODevice: 0,

Active

Node: red6"],
             "red7": ["ID:9, Original Node: red7, IODevice: 0,

Active

Node: red7"],
             "red8": ["ID:8, Original Node: red8, IODevice: 0,

Active

Node: red8"],
             "red9": ["ID:7, Original Node: red9, IODevice: 0,

Active

Node: red9"]
         },
         "node.stores": {
             "red": ["/home/clash/asterixStorage/

asterixdb5/red/storage"],

             "red10": ["/home/clash/asterixStorage/
asterixdb5/red10/storage"],
             "red11": ["/home/clash/asterixStorage/
asterixdb5/red11/storage"],
             "red12": ["/home/clash/asterixStorage/
asterixdb5/red12/storage"],
             "red13": ["/home/clash/asterixStorage/
asterixdb5/red13/storage"],
             "red14": ["/home/clash/asterixStorage/
asterixdb5/red14/storage"],
             "red15": ["/home/clash/asterixStorage/
asterixdb5/red15/storage"],
             "red16": ["/home/clash/asterixStorage/
asterixdb5/red16/storage"],
             "red2": ["/home/clash/asterixStorage/
asterixdb5/red2/storage"],
             "red3": ["/home/clash/asterixStorage/
asterixdb5/red3/storage"],
             "red4": ["/home/clash/asterixStorage/
asterixdb5/red4/storage"],
             "red5": ["/home/clash/asterixStorage/
asterixdb5/red5/storage"],
             "red6": ["/home/clash/asterixStorage/
asterixdb5/red6/storage"],
             "red7": ["/home/clash/asterixStorage/
asterixdb5/red7/storage"],
             "red8": ["/home/clash/asterixStorage/
asterixdb5/red8/storage"],
             "red9": ["/home/clash/asterixStorage/

asterixdb5/red9/storage"]

         },
         "plot.activate": false,
         "replication.enabled": false,
         "replication.factor": 2,
         "replication.log.batchsize": 4096,
         "replication.log.buffer.numpages": 8,
         "replication.log.buffer.pagesize": 131072,
         "replication.max.remote.recovery.attempts": 5,
         "replication.timeout": 30,
         "storage.buffercache.maxopenfiles": 2147483647,
         "storage.buffercache.pagesize": 131072,
         "storage.buffercache.size": 536870912,
         "storage.lsm.bloomfilter.falsepositiverate": 0.01,
         "storage.memorycomponent.globalbudget": 536870912,
         "storage.memorycomponent.numcomponents": 2,
         "storage.memorycomponent.numpages": 256,
         "storage.memorycomponent.pagesize": 131072,
         "storage.metadata.memorycomponent.numpages": 256,
         "transaction.log.dirs": {
             "red": "/home/clash/asterixStorage/

asterixdb5/red/txnlog",

             "red10": "/home/clash/asterixStorage/

asterixdb5/red10/txnlog",

             "red11": "/home/clash/asterixStorage/

asterixdb5/red11/txnlog",

             "red12": "/home/clash/asterixStorage/

asterixdb5/red12/txnlog",

             "red13": "/home/clash/asterixStorage/

asterixdb5/red13/txnlog",

             "red14": "/home/clash/asterixStorage/

asterixdb5/red14/txnlog",

             "red15": "/home/clash/asterixStorage/

asterixdb5/red15/txnlog",

             "red16": "/home/clash/asterixStorage/

asterixdb5/red16/txnlog",

             "red2": "/home/clash/asterixStorage/

asterixdb5/red2/txnlog",

             "red3": "/home/clash/asterixStorage/

asterixdb5/red3/txnlog",

             "red4": "/home/clash/asterixStorage/

asterixdb5/red4/txnlog",

             "red5": "/home/clash/asterixStorage/

asterixdb5/red5/txnlog",

             "red6": "/home/clash/asterixStorage/

asterixdb5/red6/txnlog",

             "red7": "/home/clash/asterixStorage/

asterixdb5/red7/txnlog",

             "red8": "/home/clash/asterixStorage/

asterixdb5/red8/txnlog",

             "red9": "/home/clash/asterixStorage/

asterixdb5/red9/txnlog"

         },
         "txn.commitprofiler.reportinterval": 5,
         "txn.job.recovery.memorysize": 67108864,
         "txn.lock.escalationthreshold": 1000,
         "txn.lock.shrinktimer": 5000,
         "txn.lock.timeout.sweepthreshold": 10000,
         "txn.lock.timeout.waitthreshold": 60000,
         "txn.log.buffer.numpages": 8,
         "txn.log.buffer.pagesize": 131072,
         "txn.log.checkpoint.history": 0,
         "txn.log.checkpoint.lsnthreshold": 67108864,
         "txn.log.checkpoint.pollfrequency": 120,
         "txn.log.partitionsize": 268435456,
         "web.port": 19001,
         "web.queryinterface.port": 19006,
         "web.secondary.port": 19005
     },
     "fullShutdownUri":
"http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true";,
     "metadata_node": "red16",
     "ncs": [
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config";,
             "node_id": "red15",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_1"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red15/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config";,
             "node_id": "red14",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_2"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red14/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config";,
             "node_id": "red16",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_0"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red16/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config";,
             "node_id": "red11",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_5"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red11/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config";,
             "node_id": "red10",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_6"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red10/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config";,
             "node_id": "red13",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_3"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red13/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config";,
             "node_id": "red12",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_4"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/

red12/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config";,
             "node_id": "red6",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_10"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config";,
             "node_id": "red",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_15"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config";,
             "node_id": "red5",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_11"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config";,
             "node_id": "red8",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_8"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config";,
             "node_id": "red7",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_9"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config";,
             "node_id": "red2",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_14"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config";,
             "node_id": "red4",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_12"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config";,
             "node_id": "red3",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_13"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/

threaddump"

         },
         {
             "configUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config";,
             "node_id": "red9",
             "partitions": [{
                 "active": true,
                 "partition_id": "partition_7"
             }],
             "state": "ACTIVE",
             "statsUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats";,
             "threadDumpUri":
"http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/

threaddump"

         }
     ],
     "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown

",

     "state": "ACTIVE",
     "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version";
}

2.Catalog_return:2.28G

catalog_sales:31.01G

inventory:8.63G

3.As for Pig and Hive, I always use the default configuration. I

didn't

set

the partition things for them. And for Spark, we use 200

partitions,

which

may be improved and just not bad. For AsterixDB, I also set the

cluster

using default value of partition and JVM things (I didn't manually

set

these parameters).



On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <[email protected]>

wrote:

Mingda,

      1. Can you paste the returned JSON of http://<master
node>:19002/admin/cluster at your side? (Pls replace <master

node>

with

the

actual master node name or IP)
      2. Can you list the individual size of each dataset involved

in

the

query, e.g., catalog_returns, catalog_sales, and inventory?  (I

assume

100GB is the overall size?)
      3. Do Spark/Hive/Pig saturate all CPUs on all machines,

i.e.,

how

many

partitions are running on each machine?  (It seems that your

AsterixDB

configuration wouldn't saturate all CPUs for queries --- in the

current

AsterixDB master, the computation parallelism is set to be the

same

as

the

storage parallelism (i.e., the number of iodevices on each NC).

I've

submitted a new patch that allow flexible computation

parallelism,

which

should be able to get merged into master very soon.)
      Thanks!

Best,
Yingyi

On Tue, Dec 20, 2016 at 5:44 PM, mingda li <

[email protected]

wrote:

Oh, sure. When we test the 100G multiple join, we find

AsterixDB

is

slower

than Spark (but still faster than Pig and Hive).
I can share with you the both plots: 1-10G.eps and 1-100G.eps.

(We

will

only use 1-10G.eps in our paper).
And thanks for Ian's advice:* The dev list generally strips

attachments.

Maybe you can just put the config inline? Or link to a

pastebin/gist?*

I know why you can't see the attachments. So I move the plots

with

two

documents to my Dropbox.
You can find the
1-10G.eps here: https://www.dropbox.com/s/

rk3xg6gigsfcuyq/1-10G.eps?dl=0

1-100G.eps here:https://www.dropbox.com/

s/tyxnmt6ehau2ski/1-100G.eps

dl=0

cc_conf.pdf here: https://www.dropbox.com/s/

y3of1s17qdstv5f/cc_conf.pdf?

dl=0
CompleteQuery.pdf here:
https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.

pdf?dl=0

On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <

[email protected]>

wrote:

Mingda: Please also share the numbers for 100GB, which show

AsterixDB

not

quite doing as well as Spark. These 100GB results will not be

in

our

submission version, since they’re not needed for the desired

message:

picking the right join order matters. Nevertheless, I’d like

to

get a

better understanding of what’s going on in the larger dataset

regime.



-Tyson



From: Yingyi Bu [mailto:[email protected]]
Sent: Tuesday, December 20, 2016 4:30 PM
To: [email protected]
Cc: Michael Carey <[email protected]>; Tyson Condie <
[email protected]>
Subject: Re: Time of Multiple Joins in AsterixDB



Hi Mingda,



      It looks that you didn't attach the pdf?

      Thanks!



Best,

Yingyi



On Tue, Dec 20, 2016 at 4:15 PM, mingda li <

[email protected]

<mailto:[email protected]> > wrote:

Sorry for the wrong version of cc.conf. I convert it to pdf

version

as

attachment.



On Tue, Dec 20, 2016 at 4:06 PM, mingda li <

[email protected]

<mailto:[email protected]> > wrote:

Dear all,



I am testing different systems' (AsterixDB, Spark, Hive, Pig)

multiple

joins to see if there is a big difference with different join

order.

This

is the reason for our research on multiple join and the

result

will

apppear

in our paper which is to be submitted to VLDB soon. Could you

help

us

to

make sure that the test results make sense for AsterixDB?



We configure the AsterixDB 0.8.9 ( use

asterix-server-0.8.9-SNAPSHOT-

binary-assembly)

in our cluster of 16 machines, each with a 3.40GHz i7

processor

(4

cores

and 2 hyper-threads per core), 32GB of RAM and 1TB of disk

capacity.

The

operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0.

During

configuration, I follow the NCService instruction here
https://ci.apache.org/projects/asterixdb/ncservice.html.

And I

set

the

cc.conf as in attachment. (Each node work as nc and the first

node

also

work as cc).



For experiment, we use 3 fact tables from TPC-DS: inventory;
catalog_sales; catalog_returns with TPC-DS scale factor 1g

and

10g.

The

multiple join query we use in AsterixDB are as following:



Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM

catalog_sales

cs1

JOIN catalog_returns cr1

  ON (cs1.cs_order_number = cr1.cr_order_number AND

cs1.cs_item_sk =

cr1.cr_item_sk))  m1 JOIN inventory i1 ON i1.inv_item_sk =

cs1.cs_item_sk;



Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM

catalog_sales

cs1

JOIN

inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN

catalog_returns

cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND

cs1.cs_item_sk

cr1.cr_item_sk);



We load the data to AsterixDB firstly and run the two

different

queries.

(The complete version of all queries for AsterixDB is in

attachment)

We

assume the data has already been stored in AsterixDB and only

count

the

time for multiple join.



Meanwhile, we use the same dataset and query to test Spark,

Pig

and

Hive.

The result is shown in the attachment's figure. And you can

find

AsterixDB's time is always better than others  no matter good

or

bad

order:-) (BTW, the y scale of figure is time in log scale.

You

can

see

the

time by the label of each bar.)



Thanks for your help.



Bests,

Mingda

Re: Time of Multiple Joins in AsterixDB

Reply via email to