alamb opened a new issue, #11388: URL: https://github.com/apache/datafusion/issues/11388
First of all, I'm not sure we need the distinction between "user guide" and "library user guide" when it comes to data frames. The only way you can use a data frame is if you are using it as library? I'm unsure why I should be reading one section or the other. Second, I think you lose a lot of context by removing the table. The `SessionContext` and `DataFrame` structs both expose large API surfaces. I think they become much easier to digest once you understand that there is actually a fairly small number of categories of things being exposed. However, the API documentation doesn't provide any way of seeing this structure. Ideally, there would be something like a way to do something like tagging the methods into different categories. But I think the important part is simply to note that there are transformations, methods that execute the frame and administrative methods. I might further break down the methods that execute the frame into those that return a new frame in some way and those that write to a data sink? That is, I'm not sure its necessary to list every method in each of these categories but it is helpful to identify the categories. That being said, I think a table, perhaps more granular, with links to the API documentation for each method and possibly even links to the SQL equivalent where appropriate would be a good long term goal. Is there some tooling / macros we could build to support this in a sustainable way? Also, is it the case that I can only create a data frame via SessionContext? The _typically_ in the introduction suggests there are other ways of doing it. I wonder if it would be better to be more precise and just enumerate the different ways you can create a data frame. I think it's something like: read from a file, read from a table (which really covers a lot of possibilities), execute SQL statements. So - I suppose to make this executable within the context of this PR - perhaps reduce the tables to more of a summary? But also curious to hear from others. Finally, not for this PR, I wonder if SessionContext warrants its own section. As with DataFrame I think it would benefit from a discussion of the different categories of things it can be used for. Related, it's becoming clear to me from poking around the documentation and methods its becoming clear that there is a great deal of flexibility in mixing and matching SQL and data frames if you want to but I'm not sure that's coming across in the guides? When I have time I can try drafting something to see how it might fit. _Originally posted by @efredine in https://github.com/apache/datafusion/issues/11324#issuecomment-2214357564_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org