Hello Azuyy yu I left some comments.
-Jinho Best regards 2015-01-16 14:37 GMT+09:00 Azuryy Yu <[email protected]>: > Hi, > > I tested Tajo before half a year, then not focus on Tajo because some other > works. > > then I setup a small dev Tajo cluster this week.(six nodes, VM) based on > Hadoop-2.6.0. > > so my questions is: > > 1) From I know half a yea ago, Tajo is work on Yarn, using Yarn scheduler > to manage job resources. but now I found it doesn't rely on Yarn, because > I only start HDFS daemons, no yarn daemons. so Tajo has his own job > sheduler ? > > Now, tajo does using own task scheduler. and You can start tajo without Yarn daemons Please refer to http://tajo.apache.org/docs/0.9.0/configuration.html > > 2) Does that we need to put the file replications on every nodes on Tajo > cluster? > No, tajo does not need more replication. if you set more replication, data locality can be increased such as I have a six nodes Tajo cluster, then should I set HDFS block > replication to six? because: > > I noticed when I run Tajo query, some nodes are busy, but some is free. > because the file's blocks are only located on these nodes. non others. > > In my opinion, you need to run balancer http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer 3)the test data set is 4 million rows. nearly several GB. but it's very > slow when I runing: select count(distinct ID) from ****; > Any possible problems here? > Could you share tajo-env.sh, tajo-site.xml ? > > > Thanks >
