[ 
https://issues.apache.org/jira/browse/ARROW-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515358#comment-16515358
 ] 

Jamie Elliott edited comment on ARROW-2712 at 6/18/18 9:50 AM:
---------------------------------------------------------------

I've given a little more thought to the idea of a native C# implementation. I 
found the C++ implementation the easiest to understand. 

Considering a narrow proof of concept that would replicate the classes in 
arrow/cpp/src/arrow/ but not subfolders. 

Hopefully enough to replicate example 
[https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html]

It seems to me the scope of that is manageable and there are some more or less 
ready made components in corefx. 

*MemoryPool*

[C++ 
MemoryPool|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.h]]
 can be replicated via 

[C# 
MemoryPool|[https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/MemoryPool.cs]].
 

Maybe start with a built in Memory Pool, that allocates a large block of 
managed memory and pins 
[https://github.com/aspnet/Common/tree/dev/shared/Microsoft.Extensions.Buffers.MemoryPool.Sources]

Alternatively could PInvoke Arrow C++ Allocator. 

Another interesting point of reference is 
[https://github.com/allisterb/jemalloc.NET]

*Buffer*

[C++ 
Buffer|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h]] 
can likely be replicated by something built on top of Memory<T>. Span<T> and 
Memory<T> are used for 0 copy slicing 

[https://msdn.microsoft.com/en-us/magazine/mt814808.aspx]

*Array*

[https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h]

Builds naturally from Buffer.

Note that ArrayVector = std::vector<std::shared_ptr<Array>>;

*ChunkedArray* 

A data structure managing a list of primitive Arrow arrays logically as one 
large array

[https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h]

Compare to 
[https://github.com/dotnet/corefx/blob/master/src/System.IO.Pipelines/src/System/IO/Pipelines/BufferSegment.cs]

 

 

Note one assumption is that in general std::shared_ptr<T> can be replaced by 
just T in C# managed classes. 

Gotta run now, more to follow...

 

 

 

 

 


was (Author: jamie elliott):
I've given a little more thought to the idea of a native C# implementation. I 
found the C++ implementation the easiest to understand. 

Considering a narrow proof of concept that would replicate the classes in 
arrow/cpp/src/arrow/ but not subfolders. 

Hopefully enough to replicate example 
[https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html]

It seems to me the scope of that is manageable and there are some more or less 
ready made components in corefx. 

*MemoryPool*

[C++ 
MemoryPool|([https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.h])]
 can be replicated via 

[C# 
MemoryPool|[https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/MemoryPool.cs]].
 

Maybe start with a built in Memory Pool, that allocates a large block of 
managed memory and pins 
[https://github.com/aspnet/Common/tree/dev/shared/Microsoft.Extensions.Buffers.MemoryPool.Sources]

Alternatively could PInvoke Arrow C++ Allocator. 

Another interesting point of reference is 
[https://github.com/allisterb/jemalloc.NET]

*Buffer*

[C++ 
Buffer|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h]] 
can likely be replicated by something built on top of Memory<T>. Span<T> and 
Memory<T> are used for 0 copy slicing 

[https://msdn.microsoft.com/en-us/magazine/mt814808.aspx]

*Array*

[https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h]

Builds naturally from Buffer.

Note that ArrayVector = std::vector<std::shared_ptr<Array>>;

*ChunkedArray* 

A data structure managing a list of primitive Arrow arrays logically as one 
large array

[https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h]

Compare to 
[https://github.com/dotnet/corefx/blob/master/src/System.IO.Pipelines/src/System/IO/Pipelines/BufferSegment.cs]

 

 

Note one assumption is that in general std::shared_ptr<T> can be replaced by 
just T in C# managed classes. 

Gotta run now, more to follow...

 

 

 

 

 

> .NET Language Binding for Arrow
> -------------------------------
>
>                 Key: ARROW-2712
>                 URL: https://issues.apache.org/jira/browse/ARROW-2712
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: GLib
>            Reporter: Jamie Elliott
>            Priority: Major
>              Labels: features, newbie
>
> A feature request. I've seen this pop up in a few places. Want to have a 
> record of discussion on this topic. 
> I may be open to contributing this, but first need some general guidance on 
> approach so I can understand effort level. 
> It looks like there is not a good tool available for GObject Introspection 
> binding to .NET so the easy pathway via Arrow Glib C API appears to be 
> closed. 
> The only GObject integration for .NET appears to be Mono GAPI
> [http://www.mono-project.com/docs/gui/gtksharp/gapi/]
> From what I can see this produces a GIR or similar XML, then generates C# 
> code directly from that. Likely involves many manual fix ups of the XML. 
> Worth a try? 
>  
> Alternatively I could look at generating some other direct binding from .NET 
> to C/C++. Where I work we use Swig [http://www.swig.org/]. Good for vanilla 
> cases, requires hand crafting of the .i files and specialized marshalling 
> strategies for optimizing performance critical cases. 
> Haven't tried CppSharp but it looks more appealing than Swig in some ways 
> [https://github.com/mono/CppSharp/wiki/Users-Manual]
> In either case, not sure if better to use Glib C API or C++ API directly. 
> What would be pros/cons? 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to