Fly-a-Kite opened a new issue, #22554:
URL: https://github.com/apache/datafusion/issues/22554

   ### Describe the bug
   
   `SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST LIMIT 1` should return
   the first row of the same ordered `DISTINCT` query without `LIMIT`. On
   DataFusion `53.0.0`, the full ordered query puts `NULL` first, but adding
   `LIMIT 1` returns the first non-null value for string, integer, and float
   columns.
   
   ### To Reproduce
   
   ## Environment
   ```text
    OS: Ubuntu 24.04.1 x86_64
    Python: 3.12.3
    datafusion: 53.0.0
    pyarrow: 24.0.0
   ```
   ## Reproduce
   ```python
   import pyarrow as pa
   from datafusion import SessionContext
   
   ctx = SessionContext()
   batch = pa.RecordBatch.from_pylist(
       [{"v": None}, {"v": ""}, {"v": "a"}],
       schema=pa.schema([pa.field("v", pa.string(), nullable=True)]),
   )
   ctx.register_record_batches("t0", [[batch]])
   
   full_sql = "SELECT DISTINCT v FROM t0 ORDER BY v ASC NULLS FIRST"
   top1_sql = full_sql + " LIMIT 1"
   
   full = ctx.sql(full_sql).collect()[0].to_pydict()["v"]
   top1 = ctx.sql(top1_sql).collect()[0].to_pydict()["v"]
   
   print("full:", full)
   print("top1:", top1)
   assert top1 == full[:1]
   ```
   
   ## Output
   ```text
   full: [None, '', 'a']
   top1: [None]
   ```
   
   ### Expected behavior
   
   ```text
   full: [None, '', 'a']
   top1: ['']
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to