alamb opened a new issue, #11388:
URL: https://github.com/apache/datafusion/issues/11388

   First of all, I'm not sure we need the distinction between "user guide" and 
"library user guide" when it comes to data frames. The only way you can use a 
data frame is if you are using it as library? I'm unsure why I should be 
reading one section or the other.
   
   Second, I think you lose a lot of context by removing the table. The 
`SessionContext` and `DataFrame` structs both expose large API surfaces. I 
think they become much easier to digest once you understand that there is 
actually a fairly small number of categories of things being exposed. However, 
the API documentation doesn't provide any way of seeing this structure. 
Ideally, there would be something like a way to do something like tagging the 
methods into different categories. 
   
   But I think the important part is simply to note that there are 
transformations, methods that execute the frame and administrative methods. I 
might further break down the methods that execute the frame into those that 
return a new frame in some way and those that write to a data sink? That is, 
I'm not sure its necessary to list every method in each of these categories but 
it is helpful to identify the categories. That being said, I think a table, 
perhaps more granular, with links to the API documentation for each method and 
possibly even links to the SQL equivalent where appropriate would be a good 
long term goal. Is there some tooling / macros we could build to support this 
in a sustainable way?
   
   Also, is it the case that I can only create a data frame via SessionContext? 
The _typically_ in the introduction suggests there are other ways of doing it. 
I wonder if it would be better to be more precise and just enumerate the 
different ways you can create a data frame. I think it's something like: read 
from a file, read from a table (which really covers a lot of possibilities), 
execute SQL statements.
   
   So - I suppose to make this executable within the context of this PR - 
perhaps reduce the tables to more of a summary? But also curious to hear from 
others.
   
   Finally, not for this PR, I wonder if SessionContext warrants its own 
section. As with DataFrame I think it would benefit from a discussion of the 
different categories of things it can be used for. Related, it's becoming clear 
to me from poking around the documentation and methods its becoming clear that 
there is a great deal of flexibility in mixing and matching SQL and data frames 
if you want to but I'm not sure that's coming across in the guides? When I have 
time I can try drafting something to see how it might fit.
   
   _Originally posted by @efredine in 
https://github.com/apache/datafusion/issues/11324#issuecomment-2214357564_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to