Alexander Petrossian (PAF) created ORC-1583:
-----------------------------------------------

             Summary: BloomFilterColumns inside list/map
                 Key: ORC-1583
                 URL: https://issues.apache.org/jira/browse/ORC-1583
             Project: ORC
          Issue Type: Improvement
    Affects Versions: 1.9.2
            Reporter: Alexander Petrossian (PAF)


Currently when specifying names of columns to index we can use syntax:
* field.nestedField1.nestedField2

org.apache.orc.OrcUtils#findColumn is being used

But when specifying SearchArgument we can use extended syntax:
* field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4

org.apache.orc.TypeDescription#findSubtype is used

This is unbalanced and does not allow to index only one column inside a 
list/map.

Currently when there is a list/map in expression writer will activate bloom 
filter for all columns, contained inside it, in my case -- hundreds of columns 
we do not ever use to search = we do not need those be indexed.


Maybe findSubtype approach can be used in both cases: indexing+searching, this 
way code will be balanced?

Offhand there seems to be nothing breaking in to just findColumn call with 
findSubtype call.

Thanks for your attention!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to