[ https://issues.apache.org/jira/browse/CARBONDATA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kumar vishal resolved CARBONDATA-2584. -------------------------------------- Resolution: Fixed Assignee: kumar vishal > CarbonData Local Dictionary Support > ----------------------------------- > > Key: CARBONDATA-2584 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2584 > Project: CarbonData > Issue Type: New Feature > Reporter: kumar vishal > Assignee: kumar vishal > Priority: Major > Attachments: CarbonData Local Dictionary Support Design Doc.docx > > > Currently CarbonData supports global dictionary or No-Dictionary (Plain-Text > stored in LV format) for storing dimension column data. > *Bottleneck with Global Dictionary* > It’s difficult for user to determine whether the column should be dictionary > or not if number of columns in table is high. > Global dictionary generation generally slows down the load process. > Multiple IO operations are made during load even though dictionary already > exists. > During query, multiple IO operations done for reading dictionary files and > carbondata files. > *Bottleneck with No-Dictionary* > Storage size is high as we store the data in LV format > Query on No-Dictionary column is slower as data read/processed is more > Filtering is slower on No-Dictionary columns as number of comparison is high > Memory footprint is high > *The above bottlenecks can be solved by generating dictionary for low > cardinality columns at each blocklet level, which will help to achieve below > benefits:* > Reduces the extra IO operations read/write on the dictionary files generated > in case of global dictionary. > It will eliminate the problem for user to identify the dictionary columns > when the number of columns are more in a table. > It helps in getting more compression on dimension columns with less > cardinality. > Filter queries and full scan queries on No-dictionary columns with local > dictionary will be faster as filter will be done on encoded data. > It will help in reducing the store size and memory footprint as only unique > values will be stored {color:#000000}as {color}part of local dictionary and > corresponding data will be stored as encoded data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)