Hi,

I tested Tajo before half a year, then not focus on Tajo because some other
works.

then I setup a small dev Tajo cluster this week.(six nodes, VM) based on
Hadoop-2.6.0.

so my questions is:

1) From I know half a yea ago, Tajo is work on Yarn, using Yarn scheduler
to manage  job resources. but now I found it doesn't rely on Yarn, because
I only start HDFS daemons, no yarn daemons. so Tajo has his own job
sheduler ?

2) Does that we need to put the file replications on every nodes on Tajo
cluster?
such as I have a six nodes Tajo cluster, then should I set HDFS block
replication to six? because:

I noticed when I run Tajo query, some nodes are busy, but some is free.
because the file's blocks are only located on these nodes. non others.

3)the test data set is 4 million rows. nearly several GB. but it's very
slow when I runing: select count(distinct ID) from ****;
Any possible problems here?


Thanks

Reply via email to