[jira] [Updated] (ORC-1583) BloomFilterColumns inside list/map

Alexander Petrossian (PAF) (Jira) Thu, 11 Jan 2024 04:07:03 -0800


     [ 
https://issues.apache.org/jira/browse/ORC-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander Petrossian (PAF) updated ORC-1583:
--------------------------------------------
    Description: 
Currently when specifying names of columns to index we can use syntax:
* field.nestedField1.nestedField2

org.apache.orc.OrcUtils#findColumn is being used

But when specifying SearchArgument we can use extended syntax:
* field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4

org.apache.orc.TypeDescription#findSubtype is used

This is unbalanced and does not allow to index only one column inside a 
list/map.

Currently when there is a list/map in expression writer will activate bloom 
filter for all columns, contained inside it, in my case -- hundreds of columns 
we do not ever use to search = we do not need those be indexed.


Maybe findSubtype approach can be used in both cases: indexing+searching, this 
way code will be balanced?

Offhand there seems to be nothing breaking in to just replace findColumn call 
with findSubtype call when finding column to create bloom filter in writer.

Thanks for your attention!

  was:
Currently when specifying names of columns to index we can use syntax:
* field.nestedField1.nestedField2

org.apache.orc.OrcUtils#findColumn is being used

But when specifying SearchArgument we can use extended syntax:
* field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4

org.apache.orc.TypeDescription#findSubtype is used

This is unbalanced and does not allow to index only one column inside a 
list/map.

Currently when there is a list/map in expression writer will activate bloom 
filter for all columns, contained inside it, in my case -- hundreds of columns 
we do not ever use to search = we do not need those be indexed.


Maybe findSubtype approach can be used in both cases: indexing+searching, this 
way code will be balanced?

Offhand there seems to be nothing breaking in to just findColumn call with 
findSubtype call.

Thanks for your attention!


> BloomFilterColumns inside list/map
> ----------------------------------
>
>                 Key: ORC-1583
>                 URL: https://issues.apache.org/jira/browse/ORC-1583
>             Project: ORC
>          Issue Type: Improvement
>    Affects Versions: 1.9.2
>            Reporter: Alexander Petrossian (PAF)
>            Priority: Major
>
> Currently when specifying names of columns to index we can use syntax:
> * field.nestedField1.nestedField2
> org.apache.orc.OrcUtils#findColumn is being used
> But when specifying SearchArgument we can use extended syntax:
> * field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4
> org.apache.orc.TypeDescription#findSubtype is used
> This is unbalanced and does not allow to index only one column inside a 
> list/map.
> Currently when there is a list/map in expression writer will activate bloom 
> filter for all columns, contained inside it, in my case -- hundreds of 
> columns we do not ever use to search = we do not need those be indexed.
> Maybe findSubtype approach can be used in both cases: indexing+searching, 
> this way code will be balanced?
> Offhand there seems to be nothing breaking in to just replace findColumn call 
> with findSubtype call when finding column to create bloom filter in writer.
> Thanks for your attention!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ORC-1583) BloomFilterColumns inside list/map

Reply via email to