Github user sgururajshetty commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2199#discussion_r183030674 --- Diff: integration/presto/Presto_Cluster_setup_for_Carbondata.md --- @@ -0,0 +1,135 @@ +#Presto Multinode Cluster setup For Carbondata + +### Install Presto + + * Download the 0.187 version of presto using: + + ``wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz + `` + * Extract presto tar file + ``tar zxvf presto-server-0.187.tar.gz`` + + * Download the presto CLI for the coordinator and name it presto. + + ``` + wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar + + mv presto-cli-0.187-executable.jar presto + + chmod +x presto + ``` + + ### Create configuration Files + + * Create etc folder in presto-server-0.187 directory. + * Create config.properties, jvm.config, log.properties, and node.properties files. + * Install uuid to generate a node.id + + ``` + sudo apt-get install uuid + + uuid + ``` + + +##### Contents of your node.properties file + + ``` + node.environment=production + node.id=<generated uuid> + node.data-dir=/home/ubuntu/data + ``` + +##### Contents of your jvm.config file + + ``` + -server + -Xmx16G + -XX:+UseG1GC + -XX:G1HeapRegionSize=32M + -XX:+UseGCOverheadLimit + -XX:+ExplicitGCInvokesConcurrent + -XX:+HeapDumpOnOutOfMemoryError + -XX:OnOutOfMemoryError=kill -9 %p + ``` + +##### Contents of your log.properties file + ``` + com.facebook.presto=INFO + ``` + + The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`. + +### Coordinator Configurations + + ##### Contents of your config.properties +``` +coordinator=true +node-scheduler.include-coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery-server.enabled=true +discovery.uri=<coordinator_ip>:8080 +``` +The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers. + +**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`. + +Also relation between below two configuration-properties should be like: +If, `query.max-memory-per-node=30GB` +Then, `query.max-memory=<30GB * number of nodes>` + +### Worker Configurations + +##### Contents of your config.properties + +``` +coordinator=false +http-server.http.port=8080 +query.max-memory=50GB +query.max-memory-per-node=2GB +discovery.uri=<coordinator_ip>:8080 +``` + +**Note**: `jvm.config`, `node.properties` file is same for all the nodes (worker + coordinator). All the nodes should have different `node.id` --- End diff -- `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`.
---