[jira] [Updated] (CARBONDATA-464) Frequent GC incurs when Carbon's blocklet size is enlarged from the default

Jihong MA (JIRA) Thu, 15 Dec 2016 18:32:43 -0800

     [ 
https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jihong MA updated CARBONDATA-464:
---------------------------------
    Description: 
other columnar file format fetch 1 million(a row group) at a time, its data is 
divided into column chunks in columnar format, and each column trunk consists 
of many pages, the page(default size 1 MB) can be independently uncompressed 
and processed.
In case of current carbon,  since we use larger blocklet, it requires larger 
processing memory because it decompresses all projected column chunks within a 
blocklet all at once, which consumes big amount of memory in total. Maybe we 
should consider to come up with an alternative approach to balance I/O and 
processing, in order to reduce GC pressure.

  was:
parquet might fetch from i/o 1 million(a row group) at one time, its data is 
divided into column chunks in columnar format, and each column trunk consists 
of many pages, the page(default size 1 MB) can be independently uncompressed 
and processed.
In case of current carbon since we use larger blocklet, it requires larger 
processing memory as well, as it decompresses all projected column chunks 
within a blocklet, which consumes big amount of memory. Maybe we should 
consider to come up with similar approach to balance I/O and processing, but 
such a change requires carbon format level changes.

        Summary: Frequent GC incurs when Carbon's blocklet size is enlarged 
from the default  (was: Big GC occurs frequently when Carbon's blocklet size is 
enlarged from the default)

> Frequent GC incurs when Carbon's blocklet size is enlarged from the default
> ---------------------------------------------------------------------------
>
>                 Key: CARBONDATA-464
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-464
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: suo tong
>
> other columnar file format fetch 1 million(a row group) at a time, its data 
> is divided into column chunks in columnar format, and each column trunk 
> consists of many pages, the page(default size 1 MB) can be independently 
> uncompressed and processed.
> In case of current carbon,  since we use larger blocklet, it requires larger 
> processing memory because it decompresses all projected column chunks within 
> a blocklet all at once, which consumes big amount of memory in total. Maybe 
> we should consider to come up with an alternative approach to balance I/O and 
> processing, in order to reduce GC pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-464) Frequent GC incurs when Carbon's blocklet size is enlarged from the default

Reply via email to