[
https://issues.apache.org/jira/browse/ARROW-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515358#comment-16515358
]
Jamie Elliott edited comment on ARROW-2712 at 6/18/18 9:50 AM:
---------------------------------------------------------------
I've given a little more thought to the idea of a native C# implementation. I
found the C++ implementation the easiest to understand.
Considering a narrow proof of concept that would replicate the classes in
arrow/cpp/src/arrow/ but not subfolders.
Hopefully enough to replicate example
[https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html]
It seems to me the scope of that is manageable and there are some more or less
ready made components in corefx.
*MemoryPool*
[C++
MemoryPool|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.h]]
can be replicated via
[C#
MemoryPool|[https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/MemoryPool.cs]].
Maybe start with a built in Memory Pool, that allocates a large block of
managed memory and pins
[https://github.com/aspnet/Common/tree/dev/shared/Microsoft.Extensions.Buffers.MemoryPool.Sources]
Alternatively could PInvoke Arrow C++ Allocator.
Another interesting point of reference is
[https://github.com/allisterb/jemalloc.NET]
*Buffer*
[C++
Buffer|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h]]
can likely be replicated by something built on top of Memory<T>. Span<T> and
Memory<T> are used for 0 copy slicing
[https://msdn.microsoft.com/en-us/magazine/mt814808.aspx]
*Array*
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h]
Builds naturally from Buffer.
Note that ArrayVector = std::vector<std::shared_ptr<Array>>;
*ChunkedArray*
A data structure managing a list of primitive Arrow arrays logically as one
large array
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h]
Compare to
[https://github.com/dotnet/corefx/blob/master/src/System.IO.Pipelines/src/System/IO/Pipelines/BufferSegment.cs]
Note one assumption is that in general std::shared_ptr<T> can be replaced by
just T in C# managed classes.
Gotta run now, more to follow...
was (Author: jamie elliott):
I've given a little more thought to the idea of a native C# implementation. I
found the C++ implementation the easiest to understand.
Considering a narrow proof of concept that would replicate the classes in
arrow/cpp/src/arrow/ but not subfolders.
Hopefully enough to replicate example
[https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html]
It seems to me the scope of that is manageable and there are some more or less
ready made components in corefx.
*MemoryPool*
[C++
MemoryPool|([https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.h])]
can be replicated via
[C#
MemoryPool|[https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/MemoryPool.cs]].
Maybe start with a built in Memory Pool, that allocates a large block of
managed memory and pins
[https://github.com/aspnet/Common/tree/dev/shared/Microsoft.Extensions.Buffers.MemoryPool.Sources]
Alternatively could PInvoke Arrow C++ Allocator.
Another interesting point of reference is
[https://github.com/allisterb/jemalloc.NET]
*Buffer*
[C++
Buffer|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h]]
can likely be replicated by something built on top of Memory<T>. Span<T> and
Memory<T> are used for 0 copy slicing
[https://msdn.microsoft.com/en-us/magazine/mt814808.aspx]
*Array*
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h]
Builds naturally from Buffer.
Note that ArrayVector = std::vector<std::shared_ptr<Array>>;
*ChunkedArray*
A data structure managing a list of primitive Arrow arrays logically as one
large array
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h]
Compare to
[https://github.com/dotnet/corefx/blob/master/src/System.IO.Pipelines/src/System/IO/Pipelines/BufferSegment.cs]
Note one assumption is that in general std::shared_ptr<T> can be replaced by
just T in C# managed classes.
Gotta run now, more to follow...
> .NET Language Binding for Arrow
> -------------------------------
>
> Key: ARROW-2712
> URL: https://issues.apache.org/jira/browse/ARROW-2712
> Project: Apache Arrow
> Issue Type: New Feature
> Components: GLib
> Reporter: Jamie Elliott
> Priority: Major
> Labels: features, newbie
>
> A feature request. I've seen this pop up in a few places. Want to have a
> record of discussion on this topic.
> I may be open to contributing this, but first need some general guidance on
> approach so I can understand effort level.
> It looks like there is not a good tool available for GObject Introspection
> binding to .NET so the easy pathway via Arrow Glib C API appears to be
> closed.
> The only GObject integration for .NET appears to be Mono GAPI
> [http://www.mono-project.com/docs/gui/gtksharp/gapi/]
> From what I can see this produces a GIR or similar XML, then generates C#
> code directly from that. Likely involves many manual fix ups of the XML.
> Worth a try?
>
> Alternatively I could look at generating some other direct binding from .NET
> to C/C++. Where I work we use Swig [http://www.swig.org/]. Good for vanilla
> cases, requires hand crafting of the .i files and specialized marshalling
> strategies for optimizing performance critical cases.
> Haven't tried CppSharp but it looks more appealing than Swig in some ways
> [https://github.com/mono/CppSharp/wiki/Users-Manual]
> In either case, not sure if better to use Glib C API or C++ API directly.
> What would be pros/cons?
>
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)