[jira] [Updated] (CARBONDATA-2747) Lucene build wrong DataMapDistributable for all datamaps with same DataMapSchema

jiangmanhua (JIRA) Tue, 17 Jul 2018 18:10:37 -0700


     [ 
https://issues.apache.org/jira/browse/CARBONDATA-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiangmanhua updated CARBONDATA-2747:
------------------------------------
    Description: 
similar problem in bloom datamap is in issue CARBONDATA-2746;

 

Analysis:

In `DataMapChooser#extractColumnExpression`, it does not deal with 
`MatchExpression`. This makes no information to use the column name to filter 
datamap.

 

In `DataMapChooser#contains`, all datamap are marked as useful if lucene 
datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen 
after sort step(sort by number of index column) . 

 

In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build 
DataMapDistributable for each index in same segment. This means that one 
segment will be applied `prune` by different index datamap(lucene use 
`indexPath` in `LuceneDataMapDistributable` to init its datamap object and 
build the `indexSearcherMap`)

 

In out test case, we build datamaps  on columns:name and city, one for each.

Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it 
chooses datamap of city 

So, 

  was:similar problem in bloom datamap is in issue CARBONDATA-2746


> Lucene build wrong DataMapDistributable for all datamaps with same 
> DataMapSchema
> --------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2747
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2747
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: jiangmanhua
>            Priority: Major
>
> similar problem in bloom datamap is in issue CARBONDATA-2746;
>  
> Analysis:
> In `DataMapChooser#extractColumnExpression`, it does not deal with 
> `MatchExpression`. This makes no information to use the column name to filter 
> datamap.
>  
> In `DataMapChooser#contains`, all datamap are marked as useful if lucene 
> datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is 
> chosen after sort step(sort by number of index column) . 
>  
> In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and 
> build DataMapDistributable for each index in same segment. This means that 
> one segment will be applied `prune` by different index datamap(lucene use 
> `indexPath` in `LuceneDataMapDistributable` to init its datamap object and 
> build the `indexSearcherMap`)
>  
> In out test case, we build datamaps  on columns:name and city, one for each.
> Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, 
> it chooses datamap of city 
> So, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2747) Lucene build wrong DataMapDistributable for all datamaps with same DataMapSchema

Reply via email to