[ 
https://issues.apache.org/jira/browse/ORC-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated ORC-1583:
-------------------------------
    Affects Version/s: 2.1.0
                           (was: 1.9.2)

> BloomFilterColumns inside list/map
> ----------------------------------
>
>                 Key: ORC-1583
>                 URL: https://issues.apache.org/jira/browse/ORC-1583
>             Project: ORC
>          Issue Type: Improvement
>    Affects Versions: 2.1.0
>            Reporter: Alexander Petrossian (PAF)
>            Priority: Major
>
> Currently when specifying names of columns to index we can use syntax:
> * field.nestedField1.nestedField2
> org.apache.orc.OrcUtils#findColumn is being used
> But when specifying SearchArgument we can use extended syntax:
> * field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4
> org.apache.orc.TypeDescription#findSubtype is used
> This is unbalanced and does not allow to index only one column inside a 
> list/map.
> Currently when there is a list/map in expression writer will activate bloom 
> filter for all columns, contained inside it, in my case -- hundreds of 
> columns we do not ever use to search = we do not need those be indexed.
> Maybe findSubtype approach can be used in both cases: indexing+searching, 
> this way code will be balanced?
> Offhand there seems to be nothing breaking in to just replace findColumn call 
> with findSubtype call when finding column to create bloom filter in writer.
> Thanks for your attention!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to