Alexander Petrossian (PAF) created ORC-1583: -----------------------------------------------
Summary: BloomFilterColumns inside list/map Key: ORC-1583 URL: https://issues.apache.org/jira/browse/ORC-1583 Project: ORC Issue Type: Improvement Affects Versions: 1.9.2 Reporter: Alexander Petrossian (PAF) Currently when specifying names of columns to index we can use syntax: * field.nestedField1.nestedField2 org.apache.orc.OrcUtils#findColumn is being used But when specifying SearchArgument we can use extended syntax: * field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4 org.apache.orc.TypeDescription#findSubtype is used This is unbalanced and does not allow to index only one column inside a list/map. Currently when there is a list/map in expression writer will activate bloom filter for all columns, contained inside it, in my case -- hundreds of columns we do not ever use to search = we do not need those be indexed. Maybe findSubtype approach can be used in both cases: indexing+searching, this way code will be balanced? Offhand there seems to be nothing breaking in to just findColumn call with findSubtype call. Thanks for your attention! -- This message was sent by Atlassian Jira (v8.20.10#820010)