sm4rtm4art commented on issue #11388:
URL: https://github.com/apache/datafusion/issues/11388#issuecomment-3468698283

   Hi everyone,
   
   I've been making significant progress on the DataFrame API documentation for 
issue #11388. As I delved into the topic (and fell down a rabbit hole 🌀), I 
found that a more comprehensive guide would be highly beneficial. To keep the 
content well-structured and easy to navigate, I've split it into the following 
interconnected files:
   
   1. **index.md** (~100 lines) _- Entry Point_
   2. **concepts.md** (~650 lines) - _Unofficial Topic: DataFrames and where 
they live_
   3. **creating-dataframes.md** (~1900 lines, which I'm working to condense) 
_- Unofficial Topic: Birth of a DataFrame_
   4. t**ransformations.md** (in progress, focus on the Dataframe API methods, 
low on SQL comparable queries. ~1900 lines) _- Unofficial Topic: Life of a 
DataFrame_
   5. **writing-dataframes.md** (~300 lines) _- Unofficial Topic: Death of a 
DataFrame and its memorials_
   6. **best-practices.md** (~200 lines) _- Unofficial Topic: How to treat 
DataFrames right_
   
   Given the total size of this contribution, I want to make the review process 
as smooth as possible for everyone.
   
   Proposal for the review process:
   
   **My suggestion is to submit this as a series of smaller, sequential PRs 
rather than one massive one. I would start with concepts.md as the first PR and 
then follow up with the others.**
   
   Does this approach work for you, or would you prefer a single large pull 
request?
   
   My goal is to create a thorough and welcoming resource. 
   
   Thanks for your guidance!
   
   P.S. @alamb: On a related note, in your podcast with [Developer 
Voices](https://www.youtube.com/watch?v=8QNNCr8WfDM&t=2105s), you were asked 
about the RecordBatch size of ~8000 rows and explained that research papers 
show why 4k-16k is optimal. I'm very interested in this! If you could share a 
link to that research, I'd appreciate it. I don't plan to put it in the docs 
(yet), but I'm curious. 
   
   P.s.s : The unnofficial Topics (won't be in the Docs) is kind of taken from 
the interaction of light with matter :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to