up↑
haha~~~ ------------------ Original ------------------ From: "ﻬ.贝壳里的海";<251469...@qq.com>; Date: Mon, Feb 20, 2017 09:52 AM To: "dev"<dev@carbondata.incubator.apache.org>; Subject: carbondata performance test under benchmark tpc-ds Hi all, I've made a simple performance test under benchmark tpc-ds using spark2.1.0+carbondata1.0.0, well the result seems unsatisfactory. The details are as follows: About Env: Hadoop 2.7.2 + Spark 2.1.0 + CarbonData 1.0.0 Cluster: 5 nodes, 32G mem per node About TPC-DS: Data size: 1G (test data generation script: ./dsdgen -scale 1 -suffix '.csv' -dir /data/tpc-ds/data/) Max records num of the tables: table name - inventory, record num - 11,745,000 About Performance Tuning: Spark: SPARK_WORKER_MEMORY=4g SPARK_WORKER_INSTANCES=4 Carbondata: Leaving Default to avoid configuration difference. About Performance Test Result: SQL that can execute without modify: 70% (using sql template netezza) Max duration: 39.00s Min duration: 2.18s Average duration: 9.99s Well, I want to raise a discussion about the following topics: 1. Is the hardware of the cluster reasonable? (what's the common hardware configuration about a spark/carbondata cluster [per node?]) 2. Is the result of the performance test resonable & explicable? 3. Under interactive query circumstance, Is spark + carbondata an acceptable solution? 4. Under interactive query circumstance, what's other solution may work well.(maybe the average query duration should less then 5s or even less) Thx very much ~