[ 
https://issues.apache.org/jira/browse/CARBONDATA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361927#comment-15361927
 ] 

ASF GitHub Bot commented on CARBONDATA-35:
------------------------------------------

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/16#discussion_r69502005
  
    --- Diff: 
integration/spark/src/main/scala/org/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala
 ---
    @@ -321,3 +327,108 @@ class CarbonGlobalDictionaryGenerateRDD(
         iter
       }
     }
    +
    +/**
    + *  Set column dictionry patition format
    +  *
    +  * @param id  partition id
    + * @param dimension  current carbon dimension
    + */
    +class CarbonColumnDictPatition(id: Int, dimension: CarbonDimension)
    +  extends Partition {
    +  override val index: Int = id
    +  val preDefDictDimension = dimension
    +}
    +
    +/**
    + * Use external column dict to generate global dictionary
    +  *
    +  * @param carbonLoadModel  carbon load model
    + * @param sparkContext  spark context
    + * @param table  carbon table identifier
    + * @param dimensions  carbon dimenisons having predefined dict
    + * @param hdfsLocation  carbon base store path
    + * @param dictFolderPath  path of dictionary folder
    + */
    +class CarbonColumnDictGenerateRDD(carbonLoadModel: CarbonLoadModel,
    --- End diff --
    
    please try to use DictionaryLoadModel


> generate global dict using pre-defined dict from external column file
> ---------------------------------------------------------------------
>
>                 Key: CARBONDATA-35
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-35
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Jay
>            Priority: Minor
>
> user can set colName:columnfilePath in load DML, which can provide small 
> amount of distinct values, then carbon can use these distinct values to 
> generate dictionary and avoid reading from large raw csv file. this is a new 
> feature and can improve the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to