Hi, There is no big improvement, sometimes more slower than before. I also try to increase worker's heap size and parallel, nothing improve.
default> select count(distinct auid) from test_pl_00_0; Progress: 0%, response time: 0.963 sec Progress: 0%, response time: 0.964 sec Progress: 0%, response time: 1.366 sec Progress: 0%, response time: 2.168 sec Progress: 0%, response time: 3.17 sec Progress: 0%, response time: 4.172 sec Progress: 16%, response time: 5.174 sec Progress: 16%, response time: 6.176 sec Progress: 16%, response time: 7.178 sec Progress: 33%, response time: 8.18 sec Progress: 50%, response time: 9.181 sec Progress: 50%, response time: 10.183 sec Progress: 50%, response time: 11.185 sec Progress: 50%, response time: 12.187 sec Progress: 66%, response time: 13.189 sec Progress: 66%, response time: 14.19 sec Progress: 100%, response time: 15.003 sec 2015-01-16T17:00:56.410+0800: [GC2015-01-16T17:00:56.410+0800: [ParNew: 26473K->6582K(31488K), 0.0105030 secs] 26473K->6582K(115456K), 0.0105720 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2015-01-16T17:00:56.593+0800: [GC2015-01-16T17:00:56.593+0800: [ParNew: 27574K->6469K(31488K), 0.0086300 secs] 27574K->6469K(115456K), 0.0086940 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2015-01-16T17:00:56.800+0800: [GC2015-01-16T17:00:56.800+0800: [ParNew: 27461K->5664K(31488K), 0.0122560 secs] 27461K->6591K(115456K), 0.0123210 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 2015-01-16T17:00:57.065+0800: [GC2015-01-16T17:00:57.065+0800: [ParNew: 26656K->6906K(31488K), 0.0070520 secs] 27583K->7833K(115456K), 0.0071470 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] ?count ------------------------------- 1222356 (1 rows, 15.003 sec, 8 B selected) On Fri, Jan 16, 2015 at 4:09 PM, Azuryy Yu <[email protected]> wrote: > Thanks Kim, I'll try and post back. > > On Fri, Jan 16, 2015 at 4:02 PM, Jinho Kim <[email protected]> wrote: > >> Thanks Azuryy Yu >> >> Your parallel running tasks of tajo-worker is 10 but heap memory is 3GB. >> It >> cause a long JVM pause >> I recommend following : >> >> tajo-env.sh >> TAJO_WORKER_HEAPSIZE=3000 or more >> >> tajo-site.xml >> <!-- worker --> >> <property> >> <name>tajo.worker.resource.memory-mb</name> >> <value>3512</value> <!-- 3 tasks + 1 qm task --> >> </property> >> <property> >> <name>tajo.task.memory-slot-mb.default</name> >> <value>1000</value> <!-- default 512 --> >> </property> >> <property> >> <name>tajo.worker.resource.dfs-dir-aware</name> >> <value>true</value> >> </property> >> <!-- end --> >> http://tajo.apache.org/docs/0.9.0/configuration/worker_configuration.html >> >> -Jinho >> Best regards >> >> 2015-01-16 16:02 GMT+09:00 Azuryy Yu <[email protected]>: >> >> > Thanks Kim. >> > >> > The following is my tajo-env and tajo-site >> > >> > *tajo-env.sh:* >> > export HADOOP_HOME=/usr/local/hadoop >> > export JAVA_HOME=/usr/local/java >> > _TAJO_OPTS="-server -verbose:gc >> > -XX:+PrintGCDateStamps >> > -XX:+PrintGCDetails >> > -XX:+UseGCLogFileRotation >> > -XX:NumberOfGCLogFiles=9 >> > -XX:GCLogFileSize=256m >> > -XX:+DisableExplicitGC >> > -XX:+UseCompressedOops >> > -XX:SoftRefLRUPolicyMSPerMB=0 >> > -XX:+UseFastAccessorMethods >> > -XX:+UseParNewGC >> > -XX:+UseConcMarkSweepGC >> > -XX:+CMSParallelRemarkEnabled >> > -XX:CMSInitiatingOccupancyFraction=70 >> > -XX:+UseCMSCompactAtFullCollection >> > -XX:CMSFullGCsBeforeCompaction=0 >> > -XX:+CMSClassUnloadingEnabled >> > -XX:CMSMaxAbortablePrecleanTime=300 >> > -XX:+CMSScavengeBeforeRemark >> > -XX:PermSize=160m >> > -XX:GCTimeRatio=19 >> > -XX:SurvivorRatio=2 >> > -XX:MaxTenuringThreshold=60" >> > _TAJO_MASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m" >> > _TAJO_WORKER_OPTS="$_TAJO_OPTS -Xmx3g -Xms3g -Xmn1g" >> > _TAJO_QUERYMASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m" >> > export TAJO_OPTS=$_TAJO_OPTS >> > export TAJO_MASTER_OPTS=$_TAJO_MASTER_OPTS >> > export TAJO_WORKER_OPTS=$_TAJO_WORKER_OPTS >> > export TAJO_QUERYMASTER_OPTS=$_TAJO_QUERYMASTER_OPTS >> > export TAJO_LOG_DIR=${TAJO_HOME}/logs >> > export TAJO_PID_DIR=${TAJO_HOME}/pids >> > export TAJO_WORKER_STANDBY_MODE=true >> > >> > *tajo-site.xml:* >> > >> > <configuration> >> > <property> >> > <name>tajo.rootdir</name> >> > <value>hdfs://test-cluster/tajo</value> >> > </property> >> > <property> >> > <name>tajo.master.umbilical-rpc.address</name> >> > <value>10-0-86-51:26001</value> >> > </property> >> > <property> >> > <name>tajo.master.client-rpc.address</name> >> > <value>10-0-86-51:26002</value> >> > </property> >> > <property> >> > <name>tajo.resource-tracker.rpc.address</name> >> > <value>10-0-86-51:26003</value> >> > </property> >> > <property> >> > <name>tajo.catalog.client-rpc.address</name> >> > <value>10-0-86-51:26005</value> >> > </property> >> > <property> >> > <name>tajo.worker.tmpdir.locations</name> >> > <value>/test/tajo1,/test/tajo2,/test/tajo3</value> >> > </property> >> > <!-- worker --> >> > <property> >> > <name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name> >> > <value>4</value> >> > </property> >> > <property> >> > <name>tajo.worker.resource.memory-mb</name> >> > <value>5120</value> >> > </property> >> > <property> >> > <name>tajo.worker.resource.dfs-dir-aware</name> >> > <value>true</value> >> > </property> >> > <property> >> > <name>tajo.worker.resource.dedicated</name> >> > <value>true</value> >> > </property> >> > <property> >> > <name>tajo.worker.resource.dedicated-memory-ratio</name> >> > <value>0.6</value> >> > </property> >> > </configuration> >> > >> > On Fri, Jan 16, 2015 at 2:50 PM, Jinho Kim <[email protected]> wrote: >> > >> > > Hello Azuyy yu >> > > >> > > I left some comments. >> > > >> > > -Jinho >> > > Best regards >> > > >> > > 2015-01-16 14:37 GMT+09:00 Azuryy Yu <[email protected]>: >> > > >> > > > Hi, >> > > > >> > > > I tested Tajo before half a year, then not focus on Tajo because >> some >> > > other >> > > > works. >> > > > >> > > > then I setup a small dev Tajo cluster this week.(six nodes, VM) >> based >> > on >> > > > Hadoop-2.6.0. >> > > > >> > > > so my questions is: >> > > > >> > > > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn >> > scheduler >> > > > to manage job resources. but now I found it doesn't rely on Yarn, >> > > because >> > > > I only start HDFS daemons, no yarn daemons. so Tajo has his own job >> > > > sheduler ? >> > > > >> > > > >> > > Now, tajo does using own task scheduler. and You can start tajo >> without >> > > Yarn daemons >> > > Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html >> > > >> > > >> > > > >> > > > 2) Does that we need to put the file replications on every nodes on >> > Tajo >> > > > cluster? >> > > > >> > > >> > > No, tajo does not need more replication. if you set more replication, >> > data >> > > locality can be increased >> > > >> > > such as I have a six nodes Tajo cluster, then should I set HDFS block >> > > > replication to six? because: >> > > > >> > > > I noticed when I run Tajo query, some nodes are busy, but some is >> free. >> > > > because the file's blocks are only located on these nodes. non >> others. >> > > > >> > > > >> > > In my opinion, you need to run balancer >> > > >> > > >> > >> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer >> > > >> > > >> > > 3)the test data set is 4 million rows. nearly several GB. but it's >> very >> > > > slow when I runing: select count(distinct ID) from ****; >> > > > Any possible problems here? >> > > > >> > > >> > > Could you share tajo-env.sh, tajo-site.xml ? >> > > >> > > >> > > > >> > > > >> > > > Thanks >> > > > >> > > >> > >> > >
