sm4rtm4art commented on issue #11388: URL: https://github.com/apache/datafusion/issues/11388#issuecomment-3468698283
Hi everyone, I've been making significant progress on the DataFrame API documentation for issue #11388. As I delved into the topic (and fell down a rabbit hole 🌀), I found that a more comprehensive guide would be highly beneficial. To keep the content well-structured and easy to navigate, I've split it into the following interconnected files: 1. **index.md** (~100 lines) _- Entry Point_ 2. **concepts.md** (~650 lines) - _Unofficial Topic: DataFrames and where they live_ 3. **creating-dataframes.md** (~1900 lines, which I'm working to condense) _- Unofficial Topic: Birth of a DataFrame_ 4. t**ransformations.md** (in progress, focus on the Dataframe API methods, low on SQL comparable queries. ~1900 lines) _- Unofficial Topic: Life of a DataFrame_ 5. **writing-dataframes.md** (~300 lines) _- Unofficial Topic: Death of a DataFrame and its memorials_ 6. **best-practices.md** (~200 lines) _- Unofficial Topic: How to treat DataFrames right_ Given the total size of this contribution, I want to make the review process as smooth as possible for everyone. Proposal for the review process: **My suggestion is to submit this as a series of smaller, sequential PRs rather than one massive one. I would start with concepts.md as the first PR and then follow up with the others.** Does this approach work for you, or would you prefer a single large pull request? My goal is to create a thorough and welcoming resource. Thanks for your guidance! P.S. @alamb: On a related note, in your podcast with [Developer Voices](https://www.youtube.com/watch?v=8QNNCr8WfDM&t=2105s), you were asked about the RecordBatch size of ~8000 rows and explained that research papers show why 4k-16k is optimal. I'm very interested in this! If you could share a link to that research, I'd appreciate it. I don't plan to put it in the docs (yet), but I'm curious. P.s.s : The unnofficial Topics (won't be in the Docs) is kind of taken from the interaction of light with matter :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
