Alexander Petrossian (PAF) created ORC-1554:
-----------------------------------------------
Summary: Filtering by columns, nested in LISTs
Key: ORC-1554
URL: https://issues.apache.org/jira/browse/ORC-1554
Project: ORC
Issue Type: Improvement
Affects Versions: 1.9.2
Reporter: Alexander Petrossian (PAF)
Currently searchArgument supports fields inside arrays, and that works.
We use even very nested columns and it works fine, row groups get properly
included:
{noformat}
data.request.eventItem._elem.UsageEventItem.usage.CustomerFacingServiceUsage.relatedParty._elem.resource._elem.value
{noformat}
Alas, [allowSARGToFilter mechanism|ORC-743] does not handle values inside
arrays.
Two show-stoppers here.
Small
https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/OrcFilterContext.java#L80:
{code:java}
static boolean isNull(ColumnVector[] vectorBranch, int idx) throws
IllegalArgumentException {
for (ColumnVector v : vectorBranch) {
if (v instanceof ListColumnVector || v instanceof MapColumnVector) {
throw new IllegalArgumentException(String.format(
"Found vector: %s in branch. List and Map vectors are not supported
in isNull "
+ "determination", v));
}
{code}
Big
https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/impl/filter/LeafFilter.java#L70
{code:java}
ColumnVector[] branch = fc.findColumnVector(colName);
ColumnVector v = branch[branch.length - 1];
...
if (!OrcFilterContext.isNull(branch, rowIdx) &&
allowWithNegation(v, rowIdx)) {
{code}
Here code is indexing *v* with *rowIdx*, which is totally wrong if v is nested
into some LIST (or MAP).
Row index iterates records.
But v contains column values, which are potentially fewer or more than table
records.
Their indexing nature is different.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)