geeksam opened a new issue, #49352:
URL: https://github.com/apache/arrow/issues/49352

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   _**I apologize in advance for any snark here.**_  I'll try to filter and/or 
edit it out, but I'm frustrated, and experience suggests that I won't be 
completely successful.  I appreciate the effort involved in publishing and 
maintaining this code—I just wish it were more accessible, and I'm hoping that 
this "stream of consciousness" report may help build some empathy for newbies.
   
   So.
   
   I'm working on a project that involves using a SAX parser to transform data 
and store it in Parquet.  I have 20 (not a typo: two full decades) years of 
experience writing Ruby, 18 of those full time, but I'm completely new to both 
Parquet and Arrow as of yesterday.
   
   Yesterday, I searched rubygems.org for 'parquet' and found two gems:  one 
named `parquet` and one named `red-parquet`.  I decided to start with the 
`parquet` gem, and had TDD'd a working example in about three hours (including 
time spent extracting the SAX parser from its monolith and removing the 
extraneous bits).  However, that gem is pre-1.0 and the repo shows signs of 
neglect, so I decided to try the `red-parquet` gem instead.
   
   Two hours later, I've managed to get the gem to build (the README mentions 
`rubygems-requirements-system`, which I've never heard of, but `brew install 
apache-arrow-glib` did the job), and am trying to piece together something that 
writes to a file, starting with a single field.
   
   The README for `red-arrow` has no useful examples.  The README for 
`red-arrow-format` contains enough information for me to tell that that it's 
not what I want, so that's actually helpful.  The README for 
`red-arrow-parquet` has a few examples of operations on data that seem like I 
might want to check them out later, but first I need to be able to write the 
thing...
   
   Eventually, I notice an `examples` directory in `red-arrow`, and I open 
https://github.com/apache/arrow/blob/main/ruby/red-arrow/example/write-file.rb. 
 The bit with `fields` and `schema` isn't especially Rubyish, but it seems 
straightforward enough for now (I figure I'll circle back around and try to 
figure out structs—not to be confused with Ruby's native `Struct` class—once 
I'm able to write some strings and integers).
   
   Reading down, I skip over the two nested blocks, and see... arrays for each 
column?  And then an array called `columns` that contains a lot of typed 
containers.  So, given that my source data is in rows, it looks like I may need 
to manually transpose it for this API?  Well, that's a problem for Future Sam.  
This code doesn't seem to use many Ruby idioms—but that's an editorial 
complaint, not a functional one, so I keep skimming down.  Next I see a 
RecordBatch that gets initialized with a schema, `4`, and the `columns` array.
   
   Wait, `4`?  What does that magic number signify?  NO IDEA.  My ADHD brain 
simply MUST know, so...
   
   I search for documentation, end up at 
https://rubydoc.info/gems/red-arrow/Arrow/RecordBatch, and see that the 
initializer has... no documentation.  And no viewable source.  Cool cool cool.
   
   I try searching the web for usage examples.  I don't find any.  What I do 
find is a gem called `parqueter` that's... actually, hang on a minute—it's 
rather lovely.  The examples are clear, they showcase an API that was clearly 
designed by a Rubyist, it has some features that I'd probably end up writing if 
I went the DIY route, and... HOLY CATS, THERE ARE ACTUAL COMMENTS IN THE CODE 
EXAMPLES.  `:fainting-goat:`
   
   The narrative portion ends here, because I've been burning glucose at a 
furious rate, and I. am. done.
   
   `--`
   
   In the strictest sense, these are accessibility/ergonomics/UX issues, not 
literal bugs.  If one defines a "bug" as "unexpected behavior of the code at 
runtime," I can't possibly have experienced bugs with this project, because the 
onboarding experience was so unnecessarily confusing that I never even achieved 
runtime.  But the sparse documentation is absolutely a barrier to adoption, and 
frankly, for my own projects, I'd consider that a bug.  Y'all may not, and 
that's fair; recategorize or close this issue as you see fit.
   
   I'd offer to contribute better examples, but I'm standing at the bottom end 
of a steep learning curve, staring up, and I still have a lot more unknown 
unknowns than anything else.  The limited documentation and examples that do 
exist in these projects were clearly written by someone(s) suffering from the 
curse of knowledge [[1](https://en.wikipedia.org/wiki/Curse_of_knowledge), 
[2](https://xkcd.com/2501/)], and offer me no purchase.  I _might_ end up 
taking a tour through the `parqueteur` codebase, and that _might_ help me 
understand your object model—if so, I'd be happy to circle back around and see 
what I can add to make things at least slightly less painful for the next 
person to try this out.
   
   But if nothing else, I hope this can at least provide a gentle reminder that 
survivorship bias is a thing.  Going with the original example that led to the 
coining of the term, think of me as a plane that didn't even get a chance to 
return from combat, because it crashed at the end of the runway on launch.  :)
   
   ### Component(s)
   
   Ruby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to