[
https://issues.apache.org/jira/browse/ORC-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Petrossian (PAF) updated ORC-1583:
--------------------------------------------
Description:
Currently when specifying names of columns to index we can use syntax:
* field.nestedField1.nestedField2
org.apache.orc.OrcUtils#findColumn is being used
But when specifying SearchArgument we can use extended syntax:
* field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4
org.apache.orc.TypeDescription#findSubtype is used
This is unbalanced and does not allow to index only one column inside a
list/map.
Currently when there is a list/map in expression writer will activate bloom
filter for all columns, contained inside it, in my case -- hundreds of columns
we do not ever use to search = we do not need those be indexed.
Maybe findSubtype approach can be used in both cases: indexing+searching, this
way code will be balanced?
Offhand there seems to be nothing breaking in to just replace findColumn call
with findSubtype call when finding column to create bloom filter in writer.
Thanks for your attention!
was:
Currently when specifying names of columns to index we can use syntax:
* field.nestedField1.nestedField2
org.apache.orc.OrcUtils#findColumn is being used
But when specifying SearchArgument we can use extended syntax:
* field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4
org.apache.orc.TypeDescription#findSubtype is used
This is unbalanced and does not allow to index only one column inside a
list/map.
Currently when there is a list/map in expression writer will activate bloom
filter for all columns, contained inside it, in my case -- hundreds of columns
we do not ever use to search = we do not need those be indexed.
Maybe findSubtype approach can be used in both cases: indexing+searching, this
way code will be balanced?
Offhand there seems to be nothing breaking in to just findColumn call with
findSubtype call.
Thanks for your attention!
> BloomFilterColumns inside list/map
> ----------------------------------
>
> Key: ORC-1583
> URL: https://issues.apache.org/jira/browse/ORC-1583
> Project: ORC
> Issue Type: Improvement
> Affects Versions: 1.9.2
> Reporter: Alexander Petrossian (PAF)
> Priority: Major
>
> Currently when specifying names of columns to index we can use syntax:
> * field.nestedField1.nestedField2
> org.apache.orc.OrcUtils#findColumn is being used
> But when specifying SearchArgument we can use extended syntax:
> * field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4
> org.apache.orc.TypeDescription#findSubtype is used
> This is unbalanced and does not allow to index only one column inside a
> list/map.
> Currently when there is a list/map in expression writer will activate bloom
> filter for all columns, contained inside it, in my case -- hundreds of
> columns we do not ever use to search = we do not need those be indexed.
> Maybe findSubtype approach can be used in both cases: indexing+searching,
> this way code will be balanced?
> Offhand there seems to be nothing breaking in to just replace findColumn call
> with findSubtype call when finding column to create bloom filter in writer.
> Thanks for your attention!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)