[ 
https://issues.apache.org/jira/browse/ARROW-17887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17887:
-----------------------------------
    Labels: pull-request-available  (was: )

> [R] [Doc] Improve readability of the Get Started and README pages
> -----------------------------------------------------------------
>
>                 Key: ARROW-17887
>                 URL: https://issues.apache.org/jira/browse/ARROW-17887
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Danielle Navarro
>            Assignee: Danielle Navarro
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In its current form the pkgdown Get Started and Read Me pages are a little 
> hard for new users to follow. I would argue that both pages are written in a 
> way that makes sense to someone who is already familiar with core Arrow 
> concepts, but is potentially intimidating to an R user who is curious about 
> Arrow but has never used it. The issue is perhaps most severe on the main 
> [README page](https://arrow.apache.org/docs/r/index.html) and the [Get 
> Started](https://arrow.apache.org/docs/r/articles/arrow.html) page. A few 
> examples:
> - The README page opens with the sentence **"Apache Arrow is a cross-language 
> development platform for in-memory data".** This is a problem for multiple 
> reasons. Firstly it's not really true anymore, because we encourage users to 
> rely on `Dataset` for on-disk datasets. Secondly, the sentence simply 
> *assumes* the user has a clear mental model of the difference between 
> in-memory and on-disk data. I don't think that's true for data scientists in 
> general. A data engineer likely has a more precise mental model here, but R 
> users are typically focused on analytics. Unless they have extensive 
> experience working with large data sets this isn't something we can assume. 
> Thirdly, and maybe most importantly, it doesn't explain to the user why they 
> should care about arrow: it doesn't say what the arrow package *does*. It's 
> too vague.
> - There are (IMO) too many boldfaced sections in the README page, and it's 
> very cluttered. It gives the page an intensity and feeling of "denseness" 
> that I think we should avoid at all costs. Arrow already has a reputation for 
> being a complicated project (because it is!) but we don't want our 
> documentation to have that feeling. I think we ought to be aiming for 
> something gentler and welcoming. If that means pushing more details into 
> vignettes, that's totally okay. Readers don't need to be told all the things 
> on the very first page: it's probably better to give a simpler description 
> and then push the details onto additional vignettes.
> - The "get started" page has some of the same problems as the main README. 
> The "object hierarchy" and "data object" tables only make sense once you 
> already understand core Arrow concepts. What needs to happen in both cases is 
> the tables need to be wrapped with some explanatory text that provide the 
> missing context for users, and then additional details are pushed out to 
> vignettes that explain it in more detail. 
> - The data types mapping section on the get started page has the same issue. 
> A novice user doesn't necessarily even have a clear understanding of how 
> fundamental types are represented in R, much less how they are represented in 
> Arrow. A section that simply assumes that these types are meaningful concepts 
> and gives a lookup table with various footnotes isn't at all helpful to that 
> kind of user. I think it makes more sense to again split the work: on the 
> "get started" page we should have something simple, and a longer discussion 
> of these mappings should be pushed to a vignette
> The concrete proposal here is to restructure the content of these two pages 
> to be more novice-friendly: specifically, to add more "Arrow 101" explanatory 
> notes to these pages, and to move more of the technical information to new 
> vignettes (e.g., there should be a new "data types" vignette)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to