Super interesting and great news. This is the kind of thing we have
designed Arrow for and is somewhat similar to the work for GOAI. Happy to
help/provide pointers. The sooner you put the framework, the easier it will
be for us to help out.

On Tue, Nov 28, 2017 at 7:29 AM, Johan Peltenburg <[email protected]
> wrote:

> Dear community,
>
> Over the last year we have been looking into integration of FPGA
> accelerators
> with big data frameworks such as Spark. Before Arrow took off, we
> experienced
> many issues like serialization overhead but also garbage collection issues,
> as well as language interoperability issues with our low-level stack. These
> are all problems that Arrow is now already solving for us in a very nice
> manner.
>
> We see a growing amount of support for infrastructure providers such as
> Amazon
> that offer instances with FPGA resources already. Also, we see very rapid
> advancements from the hardware technology side, where soon enough
> accelerators can (cache-coherently) be attached to host memory (for
> example in
> OpenCAPI), allowing accelerators to work in the same virtual address space
> as
> the host process.
>
> We believe that a somewhat standardized format for data in-memory like
> Arrow
> can help us generalize big data processing in FPGAs tremendously. At the
> same
> time, it is known to us that FPGAs are notorious for their high
> development time
> and low programmability. Therefore, to alleviate some of these burdens put
> upon
> an accelerator developer, we are building a generalized framework around
> Arrow
> that abstracts away a very cumbersome aspect of FPGA design; interfacing
> with
> the data.
>
> The framework takes Arrow Schemas as input, and generates a layer that on
> the
> one side interfaces with whatever the host platform provides to access host
> memory (our initial framework will target support for AXI and OpenCAPI),
> and
> on the other side will interface with the user kernel.
>
> The user can express request for access to the data in terms of row index
> ranges. The generated layer will then provide data streams to the user,
> which
> the user may read using some kernel that they designed using high-level
> synthesis (for example they could write the kernel in OpenCL). Thus, they
> do
> not need to go into the specifics of the Arrow in-memory format, bother
> with
> creating hardware constructs to deal with index buffers and validity
> buffers,
> interfacing with the host-side bus, implementing FIFO's, etc... anymore.
> Hopefully this will be beneficial to faster deployment of FPGA accelerated
> applications based on data represented in the Arrow format.
>
> Currently the framework supports schemas of primitive data types, (nested)
> lists and structs. The major challenge here was to be able to generate
> hardware
> structures from the many forms of schemas that users may provide, but these
> challenges have been solved. We are in the process of testing the
> framework in
> simulation, and will soon move to a test on real FPGA systems. With a bit
> of luck
> we hope to initially release our framework in January.
>
> We will fully open-source this framework and will attempt to make it as
> vendor
> independent as possible. Initially we hope to provide some example
> applications
> that demonstrate some of the benefits of using our framework in terms of
> productivity and the benefits of using FPGAs for specific problems in big
> data in general.
>
> We are reaching out for your comments, questions, suggestions, etc...
> Please
> give us your thoughts about this. Thank you in advance.
>
> With kind regards,
>
> Johan Peltenburg
> Computer Engineering Lab
> Delft University of Technology
>

Reply via email to