Dear community,

Over the last year we have been looking into integration of FPGA accelerators with big data frameworks such as Spark. Before Arrow took off, we experienced
many issues like serialization overhead but also garbage collection issues,
as well as language interoperability issues with our low-level stack. These
are all problems that Arrow is now already solving for us in a very nice
manner.

We see a growing amount of support for infrastructure providers such as Amazon
that offer instances with FPGA resources already. Also, we see very rapid
advancements from the hardware technology side, where soon enough
accelerators can (cache-coherently) be attached to host memory (for example in OpenCAPI), allowing accelerators to work in the same virtual address space as
the host process.

We believe that a somewhat standardized format for data in-memory like Arrow can help us generalize big data processing in FPGAs tremendously. At the same time, it is known to us that FPGAs are notorious for their high development time and low programmability. Therefore, to alleviate some of these burdens put upon an accelerator developer, we are building a generalized framework around Arrow that abstracts away a very cumbersome aspect of FPGA design; interfacing with
the data.

The framework takes Arrow Schemas as input, and generates a layer that on the
one side interfaces with whatever the host platform provides to access host
memory (our initial framework will target support for AXI and OpenCAPI), and
on the other side will interface with the user kernel.

The user can express request for access to the data in terms of row index
ranges. The generated layer will then provide data streams to the user, which
the user may read using some kernel that they designed using high-level
synthesis (for example they could write the kernel in OpenCL). Thus, they do not need to go into the specifics of the Arrow in-memory format, bother with creating hardware constructs to deal with index buffers and validity buffers,
interfacing with the host-side bus, implementing FIFO's, etc... anymore.
Hopefully this will be beneficial to faster deployment of FPGA accelerated
applications based on data represented in the Arrow format.

Currently the framework supports schemas of primitive data types, (nested)
lists and structs. The major challenge here was to be able to generate hardware
structures from the many forms of schemas that users may provide, but these
challenges have been solved. We are in the process of testing the framework in simulation, and will soon move to a test on real FPGA systems. With a bit of luck
we hope to initially release our framework in January.

We will fully open-source this framework and will attempt to make it as vendor independent as possible. Initially we hope to provide some example applications
that demonstrate some of the benefits of using our framework in terms of
productivity and the benefits of using FPGAs for specific problems in big
data in general.

We are reaching out for your comments, questions, suggestions, etc... Please
give us your thoughts about this. Thank you in advance.

With kind regards,

Johan Peltenburg
Computer Engineering Lab
Delft University of Technology

Reply via email to