Very BIG +1 for adoption of Apache Arrow. This would simplify a lot the integration with other tools
On Thu, Apr 11, 2019 at 2:21 PM Run <fan_li...@foxmail.com> wrote: > Hi guys, > > > Apache Arrow provides a cross-language, standardized, columnar, memory > format for data. > So it is highly desirable to import Arrow to Flink, and make use of its > memory layout and memory management facilities. > More background on this can be found in > https://issues.apache.org/jira/browse/FLINK-10929 > > > The problem is that, incorporating Arrow to Flink is a big move, and may > affect many modules of Flink. > In our efforts to vectorize Blink batch jobs, we have tried to incorporate > Arrow to Flink. Those were incremental changes, which can be easily turned > off with a single flag, and the changes are transparent to other parts of > the code base. > > > This is the draft of our design document: > > https://docs.google.com/document/d/1LstaiGzlzTdGUmyG_-9GSWleKpLRid8SJWXQCTqcQB4/edit?usp=sharing > > > For the first step, I suggest providing a flag which is disabled by > default, and let the MemoryManager depend on the Arrow Buffer Allocator. > With this change, all the MemorySegment will be based on Arrow buffers, but > this is transparent to other components, and should never break them. > > > After this step, we can apply the changes described in the design > documents, incrementally. > This is our initial thoughts. Would you please give your valuable comments? > > > Best, > Liya Fan