tustvold commented on issue #1745: URL: https://github.com/apache/arrow-rs/issues/1745#issuecomment-1140999321
> I think that having two or three examples increasing in complexity and involving optionality and some amount of nesting would be good. Yes, if you're happy to contribute such documentation that would be amazing :+1: > Do you think that it would make sense to go through the Arrow API even if I'm only looking to write Parquet files? I think this really depends on what the source of your data is, and if it can be cheaply read into arrow. The selling point of arrow is as a columnar interchange format, allowing different systems to pass around buffers in a way that they can efficiently process. Assuming you can cheaply convert your input data to arrow, it should be faster... That being said, currently the arrow writer has not had nearly as much attention paid to it as the reader side, and so will be slower in some cases than the row APIs. I've created a high level ticket #1764, but I'm not sure when I'll have time to get to it. > The main gripe I have/had is around the whole Dremel logic that is hard to grasp Bit of an understatement here :laughing:, FWIW I've found this to be one of the more useful guides - https://akshays-blog.medium.com/wrapping-head-around-repetition-and-definition-levels-in-dremel-powering-bigquery-c1a33c9695da My point still stands that in theory the promise of arrow is someone else will have handled this for you, but your mileage may vary. > Well I originally had a very nested schema, involving maps, nullable lists, required lists with nullable elements, etc. I'm not yet fixed on a format since I want to measure performance for a set of usecases, so I'll experiment on the format. My 2 cents is that even if tooling supports nested schemas, it often comes with unexpected caveats. For example Presto/Trino has had bugs in projection pushdown for nested schemas for years. I would strongly advise that if you can flatten your schemas, you will save yourself a lot of headaches down the line if you do so :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org