Thanks Kim, I'll try and post back. On Fri, Jan 16, 2015 at 4:02 PM, Jinho Kim <[email protected]> wrote:
> Thanks Azuryy Yu > > Your parallel running tasks of tajo-worker is 10 but heap memory is 3GB. It > cause a long JVM pause > I recommend following : > > tajo-env.sh > TAJO_WORKER_HEAPSIZE=3000 or more > > tajo-site.xml > <!-- worker --> > <property> > <name>tajo.worker.resource.memory-mb</name> > <value>3512</value> <!-- 3 tasks + 1 qm task --> > </property> > <property> > <name>tajo.task.memory-slot-mb.default</name> > <value>1000</value> <!-- default 512 --> > </property> > <property> > <name>tajo.worker.resource.dfs-dir-aware</name> > <value>true</value> > </property> > <!-- end --> > http://tajo.apache.org/docs/0.9.0/configuration/worker_configuration.html > > -Jinho > Best regards > > 2015-01-16 16:02 GMT+09:00 Azuryy Yu <[email protected]>: > > > Thanks Kim. > > > > The following is my tajo-env and tajo-site > > > > *tajo-env.sh:* > > export HADOOP_HOME=/usr/local/hadoop > > export JAVA_HOME=/usr/local/java > > _TAJO_OPTS="-server -verbose:gc > > -XX:+PrintGCDateStamps > > -XX:+PrintGCDetails > > -XX:+UseGCLogFileRotation > > -XX:NumberOfGCLogFiles=9 > > -XX:GCLogFileSize=256m > > -XX:+DisableExplicitGC > > -XX:+UseCompressedOops > > -XX:SoftRefLRUPolicyMSPerMB=0 > > -XX:+UseFastAccessorMethods > > -XX:+UseParNewGC > > -XX:+UseConcMarkSweepGC > > -XX:+CMSParallelRemarkEnabled > > -XX:CMSInitiatingOccupancyFraction=70 > > -XX:+UseCMSCompactAtFullCollection > > -XX:CMSFullGCsBeforeCompaction=0 > > -XX:+CMSClassUnloadingEnabled > > -XX:CMSMaxAbortablePrecleanTime=300 > > -XX:+CMSScavengeBeforeRemark > > -XX:PermSize=160m > > -XX:GCTimeRatio=19 > > -XX:SurvivorRatio=2 > > -XX:MaxTenuringThreshold=60" > > _TAJO_MASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m" > > _TAJO_WORKER_OPTS="$_TAJO_OPTS -Xmx3g -Xms3g -Xmn1g" > > _TAJO_QUERYMASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m" > > export TAJO_OPTS=$_TAJO_OPTS > > export TAJO_MASTER_OPTS=$_TAJO_MASTER_OPTS > > export TAJO_WORKER_OPTS=$_TAJO_WORKER_OPTS > > export TAJO_QUERYMASTER_OPTS=$_TAJO_QUERYMASTER_OPTS > > export TAJO_LOG_DIR=${TAJO_HOME}/logs > > export TAJO_PID_DIR=${TAJO_HOME}/pids > > export TAJO_WORKER_STANDBY_MODE=true > > > > *tajo-site.xml:* > > > > <configuration> > > <property> > > <name>tajo.rootdir</name> > > <value>hdfs://test-cluster/tajo</value> > > </property> > > <property> > > <name>tajo.master.umbilical-rpc.address</name> > > <value>10-0-86-51:26001</value> > > </property> > > <property> > > <name>tajo.master.client-rpc.address</name> > > <value>10-0-86-51:26002</value> > > </property> > > <property> > > <name>tajo.resource-tracker.rpc.address</name> > > <value>10-0-86-51:26003</value> > > </property> > > <property> > > <name>tajo.catalog.client-rpc.address</name> > > <value>10-0-86-51:26005</value> > > </property> > > <property> > > <name>tajo.worker.tmpdir.locations</name> > > <value>/test/tajo1,/test/tajo2,/test/tajo3</value> > > </property> > > <!-- worker --> > > <property> > > <name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name> > > <value>4</value> > > </property> > > <property> > > <name>tajo.worker.resource.memory-mb</name> > > <value>5120</value> > > </property> > > <property> > > <name>tajo.worker.resource.dfs-dir-aware</name> > > <value>true</value> > > </property> > > <property> > > <name>tajo.worker.resource.dedicated</name> > > <value>true</value> > > </property> > > <property> > > <name>tajo.worker.resource.dedicated-memory-ratio</name> > > <value>0.6</value> > > </property> > > </configuration> > > > > On Fri, Jan 16, 2015 at 2:50 PM, Jinho Kim <[email protected]> wrote: > > > > > Hello Azuyy yu > > > > > > I left some comments. > > > > > > -Jinho > > > Best regards > > > > > > 2015-01-16 14:37 GMT+09:00 Azuryy Yu <[email protected]>: > > > > > > > Hi, > > > > > > > > I tested Tajo before half a year, then not focus on Tajo because some > > > other > > > > works. > > > > > > > > then I setup a small dev Tajo cluster this week.(six nodes, VM) based > > on > > > > Hadoop-2.6.0. > > > > > > > > so my questions is: > > > > > > > > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn > > scheduler > > > > to manage job resources. but now I found it doesn't rely on Yarn, > > > because > > > > I only start HDFS daemons, no yarn daemons. so Tajo has his own job > > > > sheduler ? > > > > > > > > > > > Now, tajo does using own task scheduler. and You can start tajo > without > > > Yarn daemons > > > Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html > > > > > > > > > > > > > > 2) Does that we need to put the file replications on every nodes on > > Tajo > > > > cluster? > > > > > > > > > > No, tajo does not need more replication. if you set more replication, > > data > > > locality can be increased > > > > > > such as I have a six nodes Tajo cluster, then should I set HDFS block > > > > replication to six? because: > > > > > > > > I noticed when I run Tajo query, some nodes are busy, but some is > free. > > > > because the file's blocks are only located on these nodes. non > others. > > > > > > > > > > > In my opinion, you need to run balancer > > > > > > > > > http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer > > > > > > > > > 3)the test data set is 4 million rows. nearly several GB. but it's very > > > > slow when I runing: select count(distinct ID) from ****; > > > > Any possible problems here? > > > > > > > > > > Could you share tajo-env.sh, tajo-site.xml ? > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > >
