suyash yadav created CARBONDATA-4085:
----------------------------------------
Summary: How to improve query execution time further
Key: CARBONDATA-4085
URL: https://issues.apache.org/jira/browse/CARBONDATA-4085
Project: CarbonData
Issue Type: Improvement
Components: sql
Affects Versions: 2.0.1
Reporter: suyash yadav
Fix For: 2.0.1
Hi Team,
We are doing a POC where we would like oour query execution to be fatser,
mostly in the range of 3 to 4 seconds.
We have read carbon docuements where it has been claimed that carbondata can
help to scan PETABYTES of data and present results in 3 to 4 seconds , which
does not seem to be the case as per our observation.
Our table size is 1.6 billionand query is fetching only 4K records but still
it takes around 22 to 25 seconds for query execution.
Below is our query that we are firing:
==============================
spark.sql("select ts,resource,metric,value from fact_timestamp_global left join
tags_10_days_test on fact_timestamp_global.tags_id= tags_10_days_test.id where
metric in ('Outbound Utilization (percent)','Inbound Utilization (percent)')
and resource='10.212.7.98_if:<0001>' and ts>='2020-09-28 00:00:00' and
ts<='2020-09-28 23:55:55'").show(false)
=================================
Definition of fact_timestamp_global is like below:
========================
spark.sql("create table Fact_timestamp_GLOBAL(ts timestamp,metric
string,tags_id string,value double) partitioned by (ts2 timestamp) stored as
carbondata TBLPROPERTIES
('SORT_COLUMNS'='ts,metric','SORT_SCOPE'='GLOBAL_SORT')").show()
==========================
Definition of tags_10_days_test is like below:
====================
spark.sql("create table tags_10_days_test(id string,resource string) stored as
carbondata TBLPROPERTIES('SORT_COLUMNS'='id,resource')").show()
======================
Kindly go through above points and help us the query performence further.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)