ike560 opened a new issue, #2938:
URL: https://github.com/apache/arrow-datafusion/issues/2938

   **Describe the bug**
   
   Case 1... Exception: Internal error: Impossibly got empty window expression. 
This was likely caused by a bug in DataFusion's code.  
   Case 2... Exception: Schema error(same cause as Case 1 ?)
   
   
   **To Reproduce**
   ```py
   import datafusion
   ctx = datafusion.SessionContext()
   datafusion.__version__
   ```
   '0.6.0'  
   ```py
   !echo "a,b\n1,4\n2,5\n3,6" > example.csv
   ctx.register_csv('example', 'example.csv')
   ctx.sql('SELECT * from example').show()
   ```
   +---+---+
   | a | b |
   +---+---+
   | 1 | 4 |
   | 2 | 5 |
   | 3 | 6 |
   +---+---+
   
   
   Case 1... Exception: Internal error
   ```py
   sql = '''
   SELECT AVG(b) AS median_value  
     FROM ( 
       SELECT 
         b, 
         COUNT(b) OVER () AS row_count, 
         ROW_NUMBER() OVER (ORDER BY b) AS row_number 
       FROM example 
     )
     ORDER BY median_value
     '''
   df = ctx.sql(sql)
   df.show()
   ```
   Exception                                 Traceback (most recent call last)
   Input In [65], in <module>
     1 sql = '''
     2 SELECT AVG(b) AS median_value  
     3   FROM ( 
    (...)
     10   ORDER BY median_value
     11   '''
     12 df = ctx.sql(sql)
   ---> 13 df.show()
   
   Exception: Internal error: Impossibly got empty window expression. This was 
likely caused by a bug in DataFusion's code and we would welcome that you file 
an bug report in our issue tracker
   
   
   Case 2... Exception: Schema error
   ```py
   sql = '''
   SELECT AVG(b) AS median_value  
     FROM ( 
       SELECT 
         b, 
         COUNT(b) OVER () AS row_count, 
         ROW_NUMBER() OVER (ORDER BY b) AS row_number 
       FROM example 
     ) 
     WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)  
     '''
   df = ctx.sql(sql)
   df.show()
   ```
   Exception                                 Traceback (most recent call last)
   Input In [57], in <module>
     1 sql = '''
     2 SELECT AVG(b) AS median_value  
     3   FROM ( 
     (...)
     10   WHERE row_number IN ((row_count + 1) / 2, (row_count + 2) / 2)  
     11   '''
     12 df = ctx.sql(sql)
   ---> 13 df.show()
   
   Exception: Schema error: No field named 'row_number'. Valid fields are 
'example.b'.
   
   **Additional context**
   
   SQL is meant to be reproduced, and the processing content has no meaning.
   I think SQL is correct. Is there a way around it?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to