alamb opened a new pull request #1748:
URL: https://github.com/apache/arrow-datafusion/pull/1748


   # Which issue does this PR close?
   
   Along with https://github.com/apache/arrow-datafusion/pull/1732, Fixes 
https://github.com/apache/arrow-datafusion/issues/423 (the last part).
   
    # Rationale for this change
   Repartitioning the input to an operator that relies on its input to be 
sorted is incorrect as the repartitioning will intermix the partitions and 
effectively "unsort" the input stream
   
   We found this in IOx here 
https://github.com/influxdata/influxdb_iox/pull/3633#issuecomment-1030126757
   
   Here is a picture showing the problem:
   
   ```
       ┌─────────────────────────────────┐
       │                                 │
       │     SortPreservingMergeExec     │
       │                                 │
       └─────────────────────────────────┘
                 ▲      ▲       ▲            Input may not
                 │      │       │             be sorted!
         ┌───────┘      │       └───────┐
         │              │               │
         │              │               │
   ┌───────────┐  ┌───────────┐   ┌───────────┐
   │           │  │           │   │           │
   │ batch A1  │  │ batch A3  │   │ batch B3  │
   │           │  │           │   │           │
   ├───────────┤  ├───────────┤   ├───────────┤
   │           │  │           │   │           │
   │ batch B2  │  │ batch B1  │   │ batch A2  │
   │           │  │           │   │           │
   └───────────┘  └───────────┘   └───────────┘
         ▲              ▲               ▲
         │              │               │
         └─────────┐    │    ┌──────────┘
                   │    │    │                  Outputs
                   │    │    │                batches are
       ┌─────────────────────────────────┐   repartitioned
       │       RepartitionExec(3)        │    and may not
       │           RoundRobin            │   remain sorted
       │                                 │
       └─────────────────────────────────┘
                   ▲         ▲
                   │         │                Inputs are
             ┌─────┘         └─────┐            sorted
             │                     │
             │                     │
             │                     │
       ┌───────────┐         ┌───────────┐
       │           │         │           │
       │ batch A1  │         │ batch B1  │
       │           │         │           │
       ├───────────┤         ├───────────┤
       │           │         │           │
       │ batch A2  │         │ batch B2  │
       │           │         │           │
       ├───────────┤         ├───────────┤
       │           │         │           │
       │ batch A3  │         │ batch B3  │
       │           │         │           │
       └───────────┘         └───────────┘
   
   
        Sorted Input          Sorted Input
              A                     B
   ```
   
   The streams need to remain the way they were
   
   ```
   ┌─────────────────────────────────┐
   │                                 │
   │     SortPreservingMergeExec     │
   │                                 │
   └─────────────────────────────────┘
               ▲         ▲
               │         │         Inputs are
         ┌─────┘         └─────┐   sorted, as
         │                     │    required
         │                     │
         │                     │
   ┌───────────┐         ┌───────────┐
   │           │         │           │
   │ batch A1  │         │ batch B1  │
   │           │         │           │
   ├───────────┤         ├───────────┤
   │           │         │           │
   │ batch A2  │         │ batch B2  │
   │           │         │           │
   ├───────────┤         ├───────────┤
   │           │         │           │
   │ batch A3  │         │ batch B3  │
   │           │         │           │
   └───────────┘         └───────────┘
   
   
    Sorted Input          Sorted Input
          A                     B
   ```
   
   # What changes are included in this PR?
   1. use flag introduced in 
https://github.com/apache/arrow-datafusion/pull/1732 to not repartition the 
input to `SortPreservingMerge` operator
   2. Test to ensure it is not repartitioned
   
   # Are there any user-facing changes?
   Probably not (yet) as `SortPreservingMerge` isn't used as part of SQL 
planning
   
   \


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to