[jira] [Commented] (ASTERIXDB-2286) Parallel Sort Optimization

ASF subversion and git services (JIRA) Mon, 15 Oct 2018 21:19:31 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651096#comment-16651096
 ]


ASF subversion and git services commented on ASTERIXDB-2286:
------------------------------------------------------------

Commit 80225e2c27d77514ecaa774235951187ef524193 in asterixdb's branch 
refs/heads/master from [~alsuliman]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=80225e2 ]

[ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization

- user model changes: yes
- storage format changes: no
- interface changes: yes

details:
- new plan for sort operation which includes sampling and
replicating the stream of data to be sorted. Sort-merge connector
is removed from the plan. The sorted result now is in multiple partitions.
- new optimization rule to check whether full parallel sort is applicable.
- new Forward operator to read the replicated sort input stream and
to receive the ouput of the sampling.
- new sequential merge connector to merge a globally ordered result residing
in multiple partitions (in addition to the connector's partition computer).
- "asterix-lang-aql/pom.xml" is changed as a result of refactoring
code related to the range map handling.
- new private sampling function to generate the range map object
(local & global functions) & their type computers.

user model changes:
- new compiler property is added to enable and disable parallel sort.

interface changes:
- "ILogicalOperatorVisitor.java" includes Forward Operator.
- "ITuplePartitionComputer.java" includes initialize() to enable partitioner
to do some initialization. FieldRangePartitionComputerFactory uses it to
pick a range map.
- "ITuplePartitionComputerFactory.java". createPartitioner() is changed to
createPartitioner(IHyracksTaskContext hyracksTaskContext). Context is needed
for transferring the range map throught the context.

Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2393
Tested-by: Jenkins <[email protected]>
Contrib: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Dmitry Lychagin <[email protected]>


> Parallel Sort Optimization
> --------------------------
>
>                 Key: ASTERIXDB-2286
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2286
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: COMP - Compiler, FUN - Functions, HYR - Hyracks
>            Reporter: Ali Alsuliman
>            Assignee: Ali Alsuliman
>            Priority: Major
>              Labels: triaged
>
> The current plan for queries with ORDER BY clauses consists of two phases; 
> sorting the data locally in each partition and then sort-merging the data in 
> one single partition. Even though the local sort happens in parallel, this 
> effort is wasted by the fact that the merge is happening at one partition. It 
> is desired to remove the merge step and do a true parallel sort where data is 
> range-partitioned across the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ASTERIXDB-2286) Parallel Sort Optimization

Reply via email to