[ 
https://issues.apache.org/jira/browse/CARBONDATA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361080#comment-15361080
 ] 

ASF GitHub Bot commented on CARBONDATA-35:
------------------------------------------

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/16#discussion_r69431460
  
    --- Diff: 
integration/spark/src/main/scala/org/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
    @@ -470,12 +593,22 @@ object GlobalDictionaryUtil extends Logging {
           else {
             carbonLoadModel.getCsvHeader.split("" + 
CSVWriter.DEFAULT_SEPARATOR)
           }
    -      val (requireDimension, requireColumnNames) = 
pruneDimensions(dimensions, headers, df.columns)
    +      // generate global dict from pre defined column dict file
    +      val colDictFilePath = carbonLoadModel.getColDictFilePath
    +      carbonLoadModel.initPredefDictMap()
    --- End diff --
    
    please put inside next line


> generate global dict using pre-defined dict from external column file
> ---------------------------------------------------------------------
>
>                 Key: CARBONDATA-35
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-35
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Jay
>            Priority: Minor
>
> user can set colName:columnfilePath in load DML, which can provide small 
> amount of distinct values, then carbon can use these distinct values to 
> generate dictionary and avoid reading from large raw csv file. this is a new 
> feature and can improve the performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to