GitHub user yjhjstz created a discussion: [Proposal] Enhanced ORCA Parallel 
Planning to Align with PostgreSQL Planner

### Proposers

@yjhjstz 

### Proposal Status

Under Discussion

### Abstract

ORCA (Pivotal Query Optimizer) currently has limited parallel planning 
capabilities, This creates an inconsistency where:
  - PostgreSQL planner can generate comprehensive parallel plans
  - ORCA lacks equivalent parallel planning sophistication
  - Users must disable ORCA (set optimizer=off) to fully utilize parallel 
features

### Motivation

Extend ORCA to generate parallel execution plans that align with PostgreSQL's 
parallel planning approach, while maintaining
  compatibility with Cloudberry's MPP architecture.

### Implementation

 Extend ORCA's path generation to create parallel-aware operators:
  - CPhysicalParallelSeqScan - Parallel sequential scans
  - CPhysicalParallelIndexScan - Parallel index scans
  - CPhysicalParallelBitmapHeapScan - Parallel bitmap heap scans
  - CPhysicalParallelHash - Parallel hash operations
  - CPhysicalParallelHashJoin - Parallel hash joins
  - CPhysicalParallelAgg - Parallel aggregation
  - CPhysicalParallelSort - Parallel sorting

  2. Parallel-Aware Join Planning

  Implement parallel join strategies similar to PostgreSQL:
  // Parallel hash join with shared hash table
  class CPhysicalParallelHashJoin : public CPhysicalHashJoin {
      // Enable parallel-aware hash table sharing
      // Handle worker coordination for hash table building
      // Manage locus for HashedWorkers distribution
  };


  3. Parallel Cost Model Integration

  Enhance ORCA's cost model to account for parallel execution:
  - CPU cost reduction based on parallel_workers
  - Memory cost adjustments for shared resources
  - I/O cost distribution across workers
  - Startup cost penalties for worker coordination

4. Parallel Motion Nodes



### Rollout/Adoption Plan

Benefits

  1. Performance Consistency - Users get parallel execution regardless of 
optimizer choice
  2. Feature Parity - ORCA matches PostgreSQL planner capabilities
  3. Enhanced Scalability - Better utilization of multi-core systems
  4. Simplified Configuration - No need to disable ORCA for parallel workloads
  5. Boost tpcds, tpch

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

GitHub link: https://github.com/apache/cloudberry/discussions/1316

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to