Thanks Kim.
The following is my tajo-env and tajo-site
*tajo-env.sh:*
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/local/java
_TAJO_OPTS="-server -verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9
-XX:GCLogFileSize=256m
-XX:+DisableExplicitGC
-XX:+UseCompressedOops
-XX:SoftRefLRUPolicyMSPerMB=0
-XX:+UseFastAccessorMethods
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSClassUnloadingEnabled
-XX:CMSMaxAbortablePrecleanTime=300
-XX:+CMSScavengeBeforeRemark
-XX:PermSize=160m
-XX:GCTimeRatio=19
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=60"
_TAJO_MASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
_TAJO_WORKER_OPTS="$_TAJO_OPTS -Xmx3g -Xms3g -Xmn1g"
_TAJO_QUERYMASTER_OPTS="$_TAJO_OPTS -Xmx512m -Xms512m -Xmn256m"
export TAJO_OPTS=$_TAJO_OPTS
export TAJO_MASTER_OPTS=$_TAJO_MASTER_OPTS
export TAJO_WORKER_OPTS=$_TAJO_WORKER_OPTS
export TAJO_QUERYMASTER_OPTS=$_TAJO_QUERYMASTER_OPTS
export TAJO_LOG_DIR=${TAJO_HOME}/logs
export TAJO_PID_DIR=${TAJO_HOME}/pids
export TAJO_WORKER_STANDBY_MODE=true
*tajo-site.xml:*
<configuration>
<property>
<name>tajo.rootdir</name>
<value>hdfs://test-cluster/tajo</value>
</property>
<property>
<name>tajo.master.umbilical-rpc.address</name>
<value>10-0-86-51:26001</value>
</property>
<property>
<name>tajo.master.client-rpc.address</name>
<value>10-0-86-51:26002</value>
</property>
<property>
<name>tajo.resource-tracker.rpc.address</name>
<value>10-0-86-51:26003</value>
</property>
<property>
<name>tajo.catalog.client-rpc.address</name>
<value>10-0-86-51:26005</value>
</property>
<property>
<name>tajo.worker.tmpdir.locations</name>
<value>/test/tajo1,/test/tajo2,/test/tajo3</value>
</property>
<!-- worker -->
<property>
<name>tajo.worker.resource.tajo.worker.resource.cpu-cores</name>
<value>4</value>
</property>
<property>
<name>tajo.worker.resource.memory-mb</name>
<value>5120</value>
</property>
<property>
<name>tajo.worker.resource.dfs-dir-aware</name>
<value>true</value>
</property>
<property>
<name>tajo.worker.resource.dedicated</name>
<value>true</value>
</property>
<property>
<name>tajo.worker.resource.dedicated-memory-ratio</name>
<value>0.6</value>
</property>
</configuration>
On Fri, Jan 16, 2015 at 2:50 PM, Jinho Kim <[email protected]> wrote:
> Hello Azuyy yu
>
> I left some comments.
>
> -Jinho
> Best regards
>
> 2015-01-16 14:37 GMT+09:00 Azuryy Yu <[email protected]>:
>
> > Hi,
> >
> > I tested Tajo before half a year, then not focus on Tajo because some
> other
> > works.
> >
> > then I setup a small dev Tajo cluster this week.(six nodes, VM) based on
> > Hadoop-2.6.0.
> >
> > so my questions is:
> >
> > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn scheduler
> > to manage job resources. but now I found it doesn't rely on Yarn,
> because
> > I only start HDFS daemons, no yarn daemons. so Tajo has his own job
> > sheduler ?
> >
> >
> Now, tajo does using own task scheduler. and You can start tajo without
> Yarn daemons
> Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html
>
>
> >
> > 2) Does that we need to put the file replications on every nodes on Tajo
> > cluster?
> >
>
> No, tajo does not need more replication. if you set more replication, data
> locality can be increased
>
> such as I have a six nodes Tajo cluster, then should I set HDFS block
> > replication to six? because:
> >
> > I noticed when I run Tajo query, some nodes are busy, but some is free.
> > because the file's blocks are only located on these nodes. non others.
> >
> >
> In my opinion, you need to run balancer
>
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
>
>
> 3)the test data set is 4 million rows. nearly several GB. but it's very
> > slow when I runing: select count(distinct ID) from ****;
> > Any possible problems here?
> >
>
> Could you share tajo-env.sh, tajo-site.xml ?
>
>
> >
> >
> > Thanks
> >
>