Hi guys, I have conducted some brief performance evaluations of Tajo on Swift, and would like to share the results.
I conducted two kinds of experiments; The first experiment was to compare the performance of Tajo with on another distributed storage, i.e., HDFS. And the second experiment was the scalability test of Swift. Interestingly, the scan performance on Swift is slower more than two times than that on HDFS. In addition, the task scheduling time on Swift is much greater than that on HDFS, which means the query initialization cost is very high. You can find the detailed results at the following link. http://www.slideshare.net/jihoonson/apache-on-tajo-bringing-sql-to-the-openstack-world Based on those evaluation results, I would like to add some new features to improve the performance of Tajo on Swift. For example, progressive task scheduling can mitigate the query initialization cost on Swift. In addition, we need to support location-aware computing for segmented Swift objects. Sooner or later, I will create issues on Jira. Best Regards, Jihoon
