Super interesting and great news. This is the kind of thing we have designed Arrow for and is somewhat similar to the work for GOAI. Happy to help/provide pointers. The sooner you put the framework, the easier it will be for us to help out.
On Tue, Nov 28, 2017 at 7:29 AM, Johan Peltenburg <[email protected] > wrote: > Dear community, > > Over the last year we have been looking into integration of FPGA > accelerators > with big data frameworks such as Spark. Before Arrow took off, we > experienced > many issues like serialization overhead but also garbage collection issues, > as well as language interoperability issues with our low-level stack. These > are all problems that Arrow is now already solving for us in a very nice > manner. > > We see a growing amount of support for infrastructure providers such as > Amazon > that offer instances with FPGA resources already. Also, we see very rapid > advancements from the hardware technology side, where soon enough > accelerators can (cache-coherently) be attached to host memory (for > example in > OpenCAPI), allowing accelerators to work in the same virtual address space > as > the host process. > > We believe that a somewhat standardized format for data in-memory like > Arrow > can help us generalize big data processing in FPGAs tremendously. At the > same > time, it is known to us that FPGAs are notorious for their high > development time > and low programmability. Therefore, to alleviate some of these burdens put > upon > an accelerator developer, we are building a generalized framework around > Arrow > that abstracts away a very cumbersome aspect of FPGA design; interfacing > with > the data. > > The framework takes Arrow Schemas as input, and generates a layer that on > the > one side interfaces with whatever the host platform provides to access host > memory (our initial framework will target support for AXI and OpenCAPI), > and > on the other side will interface with the user kernel. > > The user can express request for access to the data in terms of row index > ranges. The generated layer will then provide data streams to the user, > which > the user may read using some kernel that they designed using high-level > synthesis (for example they could write the kernel in OpenCL). Thus, they > do > not need to go into the specifics of the Arrow in-memory format, bother > with > creating hardware constructs to deal with index buffers and validity > buffers, > interfacing with the host-side bus, implementing FIFO's, etc... anymore. > Hopefully this will be beneficial to faster deployment of FPGA accelerated > applications based on data represented in the Arrow format. > > Currently the framework supports schemas of primitive data types, (nested) > lists and structs. The major challenge here was to be able to generate > hardware > structures from the many forms of schemas that users may provide, but these > challenges have been solved. We are in the process of testing the > framework in > simulation, and will soon move to a test on real FPGA systems. With a bit > of luck > we hope to initially release our framework in January. > > We will fully open-source this framework and will attempt to make it as > vendor > independent as possible. Initially we hope to provide some example > applications > that demonstrate some of the benefits of using our framework in terms of > productivity and the benefits of using FPGAs for specific problems in big > data in general. > > We are reaching out for your comments, questions, suggestions, etc... > Please > give us your thoughts about this. Thank you in advance. > > With kind regards, > > Johan Peltenburg > Computer Engineering Lab > Delft University of Technology >
