sm4rtm4art commented on PR #18051: URL: https://github.com/apache/datafusion/pull/18051#issuecomment-3411080902
Thank you both for your valuable feedback! @Jefffrey @comphead My intention was to create a gentle introduction that gives users just enough context about Arrow to understand DataFusion betterbetter, without diving into implementation details. I hear your point about it feeling disjointed. To address this while keeping the gentle introduction approach, I propose: 1. **Tighten the narrative flow**: Focus on a single journey - "Why DataFusion uses Arrow" → "What is a RecordBatch conceptually" → "When you'll encounter it" 2. **Move technical details to footnotes or links**: Keep implementation details (like Arc, offset arrays) as brief notes or external links 3. **Clarify assumed knowledge upfront**: Add a brief "Who this guide is for" section stating we assume basic DataFusion knowledge but no Arrow background The goal is to give users mental models, not implementation knowledge. Would this approach address your concerns about focus? @comphead: I understand your maintenance concerns. My approach would be to provide: 1. **Conceptual overview** - Brief explanation of what RecordBatch is and why it matters to DataFusion users 2. **Practical code example** - **Most** simples example showing how it looks in practice (like the current "build a RecordBatch" example) 3. **Direct links to Arrow docs** - For readers who want deeper technical details This way, we give users enough context to understand DataFusion without duplicating Arrow's technical documentation. The guide would serve as a bridge - explaining the "why" and showing the "what it looks like", while Arrow's docs handle the detailed "how it works internally". Would this three-part approach (concept → example → link to details) work for you? It keeps our maintenance burden low while still providing value to users who encounter RecordBatch in DataFusion code. Hope I don't get overboard with text. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
