eerhardt commented on pull request #10562: URL: https://github.com/apache/arrow/pull/10562#issuecomment-875768799
Take a look at all the changes we've been making in the dotnet/runtime libraries that reduce allocations: https://github.com/dotnet/runtime/pulls?q=is%3Apr+is%3Aclosed+allocation+. Even the article I linked says: > Lots of effort goes into reducing allocation, not because the act of allocating is itself particularly expensive, but because of the follow-on costs in cleaning up after those allocations via the garbage collector (GC). If you allocate less objects, the GC has less work to do. > I still value API "unsurpriseness" I agree. Looking at the rest of the APIs that copy, they all take `IEnumerable<T>` for the parameter, and then call `ToList` or `ToArray`. Here are some examples: https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/csharp/src/Apache.Arrow/Arrays/ArrayData.cs#L34-L44 https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/csharp/src/Apache.Arrow/RecordBatch.cs#L63-L70 https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/csharp/src/Apache.Arrow/Schema.cs#L39-L48 But then we also have `internal` APIs that take `List<T>` and the code takes ownership of that list without copying. This reduces allocations internally, while keeping the public API "unsurprising". So how about following that same pattern here? * Change these APIs to take `IEnumerable` instead of `IList` * Internally when we create a Table, Column, ChunkedArray, we call the internal API that takes ownership of the list. thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
