suyash yadav created CARBONDATA-4106:
----------------------------------------

             Summary: Compaction is not working properly
                 Key: CARBONDATA-4106
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4106
             Project: CarbonData
          Issue Type: Improvement
          Components: core
    Affects Versions: 2.0.1
         Environment: Apache spark 2.4.5, carbonData 2.0.1
            Reporter: suyash yadav
             Fix For: 2.0.1
         Attachments: describe_fact_probe_1

Hi Team,

We are using apache carbondata 2.0.1 for one of our POC and we observed that we 
are not getting proper benifit from using compaction (Both majour and minor).

Please find below details for the issue we are facing:

*Name of the table used*:  fact_365_1_probe_1

+*Number of rows:*
+
select count(*) from fact_365_1_probe_1
 +--------+
 |count(1)|
 +--------+
 |76963753|

*Sample data from the table:*
======================

+-------------------+--------------------------+------------------------------------+------------------+-------------+-------------------+
 | ts| metric| tags_id| value| epoch| ts2|
 
+-------------------+--------------------------+------------------------------------+------------------+-------------+-------------------+
 |2021-01-07 
21:05:00|Probe.Duplicate.Poll.Count|c8dead9b-87ae-46ae-8703-bc2b7bfba5d4|39.611356797970274|1610033757768|2021-01-07
 00:00:00|
 |2021-01-07 
23:50:00|Probe.Duplicate.Poll.Count|62351ef2-f2ce-49d1-a2fd-a0d1e5f6a1b9| 
72.70658115131307|1610043742516|2021-01-07 00:00:00|
 
[^describe_fact_probe_1]
 
I have attached  the describe output which will show you the other details of 
the table.

The size of the table is 3.24 GB and even after running minor or majour 
compaction the size remain almost the same.

So we re not getting any benifit by running the compaction.Could you please 
review the shared details and help us in identifying if we are missing 
something here or is there any bug?


Also we need answer to the following questions about carbondata storate:

1. In case of decimal values, how the storage behaves like if i have one row 
with 20 digits after decimal and second row has only 5 digits  after decimal so 
how and what would be the difference in the storage taken.



2. My second question is , if i have two tables and one of the table has same 
values for 100 rows and other table has different values for 100 rows so how 
carbon will behave as far as the storage is concerned in this scenario. WHich 
table will take less storage or both will take same storage.

3.Also for string datatype could you please describe what is the storage 
defined for string datatype.
 


================



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to