[jira] [Updated] (ASTERIXDB-3324) Stabilize columnar datasets

Ian Maxon (Jira) Fri, 08 Dec 2023 10:09:20 -0800


     [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ian Maxon updated ASTERIXDB-3324:
---------------------------------
    Labels: triaged  (was: )

> Stabilize columnar datasets
> ---------------------------
>
>                 Key: ASTERIXDB-3324
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3324
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler, RT - Runtime, STO - Storage
>    Affects Versions: 0.9.9
>            Reporter: Wail Y. Alkowaileet
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>              Labels: triaged
>             Fix For: 0.9.9
>
>
> Multiple issues were found while running SQLPPExecutionTest while columnar is 
> the default storage format.
> h1. Filter and project pushdowns
>  * NullPointerException could thrown by PushdownUtil.getFieldName(...) when 
> access a join expression. The type computer of the join doesn't include any 
> typing information; hence the thrown exception
>  * Similarly, in other operators. We need to pick the right type computer 
> (input vs. output) depending on the operator
>  * Change scope pushdown scope when UNION ALL operator is encountered to 
> avoid pushing SELECT conditions (incorrectly) after UNION ALL. 
>  * Avoid re-registering record variables when computing the expected schema. 
> Such variables should be marked as irreplaceable
>  * Nested functions' arguments' should not be assigned to their produced 
> variables (if any).
>  * UNION ALL is quite special, it contains LogicalVariable triplets (not 
> variable expressions). When computing the defUse chains, the computer should 
> account for the variable used by the UNION ALL
>  * Disallow pushing SELECT conditions of CASE WHEN expressions
>  * Place NoOpAccessor for PKs in columnar filters to avoid advancing 
> (incorrectly) the PKs
>  * The current way of providing FilterAccessorProvider to the filter's 
> IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be 
> shared. Instead, FilterAccessorProvider should be provided by a dedicated 
> IEvaluatorContext (namely ColumnFilterEvaluatorContext).
>  * Disable filter against fields with heterogeneous numerical values (e.g., 
> double and bigint)
>  * Avoid advancing ColumnarAssembler readers if the mega-leaf node is 
> filtered out (otherwise, we can overrun the reader – no more values exception 
> – or we can read incorrect data)
>  * ColumnLeafFrame should duplicate the page buffer (a shared buffer) to 
> avoid race condition when a dataset is being scanned twice at the sametime.
> h1. Storage and record assembly:
>  * Retain empty objects
>  * Preserve the type of declared fields during record assembly (currently, we 
> only produce bigint and doubles, which could be interpreted incorrectly in 
> closed fields if smaller precision types are used)
>  * Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), 
> whether the PKs are in the root or nested in one or more objects
>  * Ensure there's always a "delegate" when assembling objects. Especially in 
> case of accessing closed and open fields at the same time. Otherwise, we can 
> end up with empty objects.
>  * Ensure created PKs readers by *PathExtractorVisitor* have max def-level = 
> 1 even if they're nested
>  * Use correct items for array and multiset declared items when 
> LazyVisitablePointable is used
>  * Process actual types instead of union when using LazyVisitablePointable
>  * Ensure key uniqueness on LOAD 
>  * Avoid accessing closed fields in empty objects (resulted from the column 
> assembler)
>  * Disallow LSM filters on columnar datasets
>  * Disallow correlated-prefix merge-policy (optimized for LSM-filters) in 
> columnar datasets
> h1. Misc.
>  * If storage format specified incorrectly (i.e., it is neither row or 
> column, then a NullPointerException is thrown)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ASTERIXDB-3324) Stabilize columnar datasets

Reply via email to