Re: carbondata performance test under benchmark tpc-ds

Yinwei Li Tue, 21 Feb 2017 00:19:28 -0800

up↑


haha~~~




------------------ Original ------------------
From:  "ﻬ.贝壳里的海";<251469...@qq.com>;
Date:  Mon, Feb 20, 2017 09:52 AM
To:  "dev"<dev@carbondata.incubator.apache.org>; 

Subject:  carbondata performance test under benchmark tpc-ds



Hi all,


  I've made a simple performance test under benchmark tpc-ds using 
spark2.1.0+carbondata1.0.0, well the result seems unsatisfactory. The details 
are as follows:


  About Env:
    Hadoop 2.7.2 + Spark 2.1.0 + CarbonData 1.0.0
    Cluster: 5 nodes, 32G mem per node
  About TPC-DS:
    Data size: 1G (test data generation script: ./dsdgen -scale 1 -suffix 
'.csv' -dir /data/tpc-ds/data/)
    Max records num of the tables: table name - inventory, record num - 
11,745,000
  About Performance Tuning:
    Spark: 
      SPARK_WORKER_MEMORY=4g
      SPARK_WORKER_INSTANCES=4
    Carbondata:
      Leaving Default to avoid configuration difference.
  About Performance Test Result:
    SQL that can execute without modify: 70% (using sql template netezza)
    Max duration: 39.00s
    Min duration: 2.18s
    Average duration: 9.99s


  Well, I want to raise a discussion about the following topics:
    1. Is the hardware of the cluster reasonable? (what's the common hardware 
configuration about a spark/carbondata cluster [per node?])
    2. Is the result of the performance test resonable & explicable?
    3. Under interactive query circumstance, Is spark + carbondata an 
acceptable solution?
    4. Under interactive query circumstance, what's other solution may work 
well.(maybe the average query duration should less then 5s or even less)


  Thx very much ~

Re: carbondata performance test under benchmark tpc-ds

Reply via email to