Development of an FPGA Accelerator framework around Apache Arrow

Johan Peltenburg Tue, 28 Nov 2017 07:29:59 -0800

Dear community,

Over the last year we have been looking into integration of FPGAacceleratorswith big data frameworks such as Spark. Before Arrow took off, weexperienced

many issues like serialization overhead but also garbage collection issues,
as well as language interoperability issues with our low-level stack. These
are all problems that Arrow is now already solving for us in a very nice
manner.

We see a growing amount of support for infrastructure providers such asAmazon

that offer instances with FPGA resources already. Also, we see very rapid
advancements from the hardware technology side, where soon enough

accelerators can (cache-coherently) be attached to host memory (forexample inOpenCAPI), allowing accelerators to work in the same virtual addressspace as

the host process.

We believe that a somewhat standardized format for data in-memory likeArrowcan help us generalize big data processing in FPGAs tremendously. At thesametime, it is known to us that FPGAs are notorious for their highdevelopment timeand low programmability. Therefore, to alleviate some of these burdensput uponan accelerator developer, we are building a generalized framework aroundArrowthat abstracts away a very cumbersome aspect of FPGA design; interfacingwith

the data.

The framework takes Arrow Schemas as input, and generates a layer thaton the

one side interfaces with whatever the host platform provides to access host

memory (our initial framework will target support for AXI and OpenCAPI),and

on the other side will interface with the user kernel.

The user can express request for access to the data in terms of row index

ranges. The generated layer will then provide data streams to the user,which

the user may read using some kernel that they designed using high-level

synthesis (for example they could write the kernel in OpenCL). Thus,they donot need to go into the specifics of the Arrow in-memory format, botherwithcreating hardware constructs to deal with index buffers and validitybuffers,

interfacing with the host-side bus, implementing FIFO's, etc... anymore.
Hopefully this will be beneficial to faster deployment of FPGA accelerated
applications based on data represented in the Arrow format.

Currently the framework supports schemas of primitive data types, (nested)

lists and structs. The major challenge here was to be able to generatehardware

structures from the many forms of schemas that users may provide, but these

challenges have been solved. We are in the process of testing theframework insimulation, and will soon move to a test on real FPGA systems. With abit of luck

we hope to initially release our framework in January.

We will fully open-source this framework and will attempt to make it asvendorindependent as possible. Initially we hope to provide some exampleapplications

that demonstrate some of the benefits of using our framework in terms of
productivity and the benefits of using FPGAs for specific problems in big
data in general.

We are reaching out for your comments, questions, suggestions, etc... Please
give us your thoughts about this. Thank you in advance.

With kind regards,

Johan Peltenburg
Computer Engineering Lab
Delft University of Technology

Development of an FPGA Accelerator framework around Apache Arrow

Reply via email to