[jira] [Updated] (ASTERIXDB-3324) Stabilize columnar datasets

Wail Y. Alkowaileet (Jira) Tue, 05 Dec 2023 10:30:03 -0800


     [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wail Y. Alkowaileet updated ASTERIXDB-3324:
-------------------------------------------
    Description: 
Multiple issues were found while running SQLPPExecutionTest while columnar is 
the default storage format.
h1. Filter and project pushdowns
 * NullPointerException could thrown by PushdownUtil.getFieldName(...) when 
access a join expression. The type computer of the join doesn't include any 
typing information; hence the thrown exception
 * Similarly, in other operators. We need to pick the right type computer 
(input vs. output) depending on the operator
 * Change scope pushdown scope when UNION ALL operator is encountered to avoid 
pushing SELECT conditions (incorrectly) after UNION ALL. 
 * Avoid re-registering record variables when computing the expected schema. 
Such variables should be marked as irreplaceable
 * Nested functions' arguments' should not be assigned to their produced 
variables (if any).
 * UNION ALL is quite special, it contains LogicalVariable triplets (not 
variable expressions). When computing the defUse chains, the computer should 
account for the variable used by the UNION ALL
 * Disallow pushing SELECT conditions of CASE WHEN expressions
 * Place NoOpAccessor for PKs in columnar filters to avoid advancing 
(incorrectly) the PKs
 * The current way of providing FilterAccessorProvider to the filter's 
IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be 
shared. Instead, FilterAccessorProvider should be provided by a dedicated 
IEvaluatorContext (namely ColumnFilterEvaluatorContext).
 * Disable filter against fields with heterogeneous numerical values (e.g., 
double and bigint)
 * Avoid advancing ColumnarAssembler readers if the mega-leaf node is filtered 
out (otherwise, we can overrun the reader – no more values exception – or we 
can read incorrect data)
 * ColumnLeafFrame should duplicate the page buffer (a shared buffer) to avoid 
race condition when a dataset is being scanned twice at the sametime.

h1. Storage and record assembly:
 * Retain empty objects
 * Preserve the type of declared fields during record assembly (currently, we 
only produce bigint and doubles, which could be interpreted incorrectly in 
closed fields if smaller precision types are used)
 * Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), 
whether the PKs are in the root or nested in one or more objects
 * Ensure there's always a "delegate" when assembling objects. Especially in 
case of accessing closed and open fields at the same time. Otherwise, we can 
end up with empty objects.
 * Ensure created PKs readers by *PathExtractorVisitor* have max def-level = 1 
even if they're nested
 * Use correct items for array and multiset declared items when 
LazyVisitablePointable is used
 * Ensure key uniqueness on LOAD 

  was:
Multiple issues were found while running SQLPPExecutionTest while columnar is 
the default storage format.
h1. Filter and project pushdowns
 * NullPointerException could thrown by PushdownUtil.getFieldName(...) when 
access a join expression. The type computer of the join doesn't include any 
typing information; hence the thrown exception
 * Similarly, in other operators. We need to pick the right type computer 
(input vs. output) depending on the operator
 * Change scope pushdown scope when UNION ALL operator is encountered to avoid 
pushing SELECT conditions (incorrectly) after UNION ALL. 
 * Avoid re-registering record variables when computing the expected schema. 
Such variables should be marked as irreplaceable
 * Nested functions' arguments' should not be assigned to their produced 
variables (if any).
 * UNION ALL is quite special, it contains LogicalVariable triplets (not 
variable expressions). When computing the defUse chains, the computer should 
account for the variable used by the UNION ALL
 * Disallow pushing SELECT conditions of CASE WHEN expressions
 * Place NoOpAccessor for PKs in columnar filters to avoid advancing 
(incorrectly) the PKs
 * The current way of providing FilterAccessorProvider to the filter's 
IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be 
shared. Instead, FilterAccessorProvider should be provided by a dedicated 
IEvaluatorContext (namely ColumnFilterEvaluatorContext).
 * Disable filter against fields with heterogeneous numerical values (e.g., 
double and bigint)
 * Avoid advancing ColumnarAssembler readers if the mega-leaf node is filtered 
out (otherwise, we can overrun the reader – no more values exception – or we 
can read incorrect data)
 * ColumnLeafFrame should duplicate the page buffer (a shared buffer) to avoid 
race condition when a dataset is being scanned twice at the sametime.

h1. Storage and record assembly:
 * Retain empty objects
 * Preserve the type of declared fields during record assembly (currently, we 
only produce bigint and doubles, which could be interpreted incorrectly in 
closed fields if smaller precision types are used)
 * Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), 
whether the PKs are in the root or nested in one or more objects
 * Ensure there's always a "delegate" when assembling objects. Especially in 
case of accessing closed and open fields at the same time. Otherwise, we can 
end up with empty objects.
 * Ensure created PKs readers by *PathExtractorVisitor* have max def-level = 1 
even if they're nested
 * Use correct items for array and multiset declared items when 
LazyVisitablePointable is used


> Stabilize columnar datasets
> ---------------------------
>
>                 Key: ASTERIXDB-3324
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3324
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler, RT - Runtime, STO - Storage
>    Affects Versions: 0.9.9
>            Reporter: Wail Y. Alkowaileet
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>             Fix For: 0.9.9
>
>
> Multiple issues were found while running SQLPPExecutionTest while columnar is 
> the default storage format.
> h1. Filter and project pushdowns
>  * NullPointerException could thrown by PushdownUtil.getFieldName(...) when 
> access a join expression. The type computer of the join doesn't include any 
> typing information; hence the thrown exception
>  * Similarly, in other operators. We need to pick the right type computer 
> (input vs. output) depending on the operator
>  * Change scope pushdown scope when UNION ALL operator is encountered to 
> avoid pushing SELECT conditions (incorrectly) after UNION ALL. 
>  * Avoid re-registering record variables when computing the expected schema. 
> Such variables should be marked as irreplaceable
>  * Nested functions' arguments' should not be assigned to their produced 
> variables (if any).
>  * UNION ALL is quite special, it contains LogicalVariable triplets (not 
> variable expressions). When computing the defUse chains, the computer should 
> account for the variable used by the UNION ALL
>  * Disallow pushing SELECT conditions of CASE WHEN expressions
>  * Place NoOpAccessor for PKs in columnar filters to avoid advancing 
> (incorrectly) the PKs
>  * The current way of providing FilterAccessorProvider to the filter's 
> IScalarEvaluatorFactories has a race condition as IHyracksTaskContext can be 
> shared. Instead, FilterAccessorProvider should be provided by a dedicated 
> IEvaluatorContext (namely ColumnFilterEvaluatorContext).
>  * Disable filter against fields with heterogeneous numerical values (e.g., 
> double and bigint)
>  * Avoid advancing ColumnarAssembler readers if the mega-leaf node is 
> filtered out (otherwise, we can overrun the reader – no more values exception 
> – or we can read incorrect data)
>  * ColumnLeafFrame should duplicate the page buffer (a shared buffer) to 
> avoid race condition when a dataset is being scanned twice at the sametime.
> h1. Storage and record assembly:
>  * Retain empty objects
>  * Preserve the type of declared fields during record assembly (currently, we 
> only produce bigint and doubles, which could be interpreted incorrectly in 
> closed fields if smaller precision types are used)
>  * Ensure PKs column indexes are [0 - N-1] (where N = the number of PKs), 
> whether the PKs are in the root or nested in one or more objects
>  * Ensure there's always a "delegate" when assembling objects. Especially in 
> case of accessing closed and open fields at the same time. Otherwise, we can 
> end up with empty objects.
>  * Ensure created PKs readers by *PathExtractorVisitor* have max def-level = 
> 1 even if they're nested
>  * Use correct items for array and multiset declared items when 
> LazyVisitablePointable is used
>  * Ensure key uniqueness on LOAD 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (ASTERIXDB-3324) Stabilize columnar datasets

Reply via email to