hknlof opened a new issue, #16478:
URL: https://github.com/apache/datafusion/issues/16478

   ### Describe the bug
   
   Hi team,
   
   I am analyzing some CVE data. Data is being read from an NdJson and as far 
as I can tell, they have the same NULLABLE fields and no variances in casing.
   
   ```console
   ❯ grep AssignedTo all.json | wc -l                                           
                                                                  
   57029
   ❯ grep -i assignedto all.json | wc -l
   57029
   ❯ grep assignedto all.json | wc -l
   0
   ```
   
   Hence, I would expect `df.describe()` to not run into any conversion 
problems.
   But, I get:
   
   ```console
   Error: Custom { kind: Other, error: Execution("Schema error: No field named 
assignedto. Did you mean '?table?.AssignedTo'?.") }
   ```
   
   <details>
   <summary>Code and more elaborate Output</summary>
   
   ```rust
   async fn main() -> Result<(), std::io::Error> {
       let ctx = SessionContext::new();
       let df = ctx.read_json("./all.json", 
NdJsonReadOptions::default()).await?;
   
       for field in df.schema().fields().iter() {
           println!("{:?}", field.name());
       }
       df.describe().await?;
   
       Ok(())
   }
   ```
   **Output**:
   ```console
   "AssignedTo"
   "Bugs"
   "CRD"
   "Candidate"
   "Description"
   "DiscoveredBy"
   "Notes"
   "Patches"
   "Priority"
   "PublicDate"
   "PublicDateAtUSN"
   "References"
   "UbuntuDescription"
   "UpstreamLinks"
   Error: Custom { kind: Other, error: Execution("Schema error: No field named 
assignedto. Did you mean '?table?.AssignedTo'?.") }
   ```
   </details>
   
   ### To Reproduce
   
   Read a NdJson and call describe on the resulting DataFrame.
   
   ### Expected behavior
   
   Describe should work on consistently named fields in JSON.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to