Re: [Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Wes McKinney Sun, 11 Feb 2018 12:39:40 -0800

hi Johan,

I'm also very excited to see the possibilities of using Arrow with
FPGAs. Would you be interested in adding this project to the Powered
By page (http://arrow.apache.org/powered_by/)? If so, feel free to
submit a pull request into the site/ portion of the project.


best
Wes

On Sun, Feb 11, 2018 at 6:42 AM, Uwe L. Korn <[email protected]> wrote:
> Dear Johan,
>
> this is an exciting use case for Arrow. Nice to hear about the benefits that 
> Arrow brings to the world of FPGAs.
>
> Greetings
>
> Uwe
>
> On Fri, Feb 9, 2018, at 10:11 PM, Johan Peltenburg - EWI wrote:
>> Dear community,
>>
>> In follow-up of the e-mail below, we have made public our repository
>> that contains our framework called Fletcher: A framework to integrate
>> FPGA accelerators with Apache Arrow.
>>
>> https://github.com/johanpel/fletcher
>>
>> With this framework you are able to provide an Arrow schema from which
>> an easy-to-use hardware interface for FPGAs is generated, reaping all
>> the benefits that Arrow already offers. On top of that it increases the
>> programmability of any acceleration project you'd want to build on top
>> of Arrow. During run-time, you simply pass your Arrow table to the run-
>> time part of the framework and your hardware will be able to read from
>> it by using row index ranges, receiving streams of data in the form of
>> the type you've defined through the schema.
>>
>> Currently there is an example project that does regular expression
>> matching on an Arrow table with strings, running on the Amazon EC2 F1
>> platform. We are not sponsored by Amazon, but as anyone can launch an
>> instance with an FPGA there, we thought it would be a good starting
>> point to hopefully gain some interest, even if you don't have an FPGA
>> card yourself.
>>
>> FPGA accelerators can be so fast that more often than not serialization
>> kills a relatively large part of the performance. Our measurements in
>> this (relatively simple) example show that by using Arrow to prevent
>> serialization, we sometimes get up to 6X improvement in performance over
>> not using Arrow, especially if we start in languages that run on JVMs,
>> for example. (Thanks everyone!)
>>
>> We are looking forward for people with a little bit of FPGA experience
>> to try it out and receive their thoughts, comments, etc. Please drop me
>> an e-mail.
>>
>> With kind regards,
>>
>> Johan Peltenburg
>> Computer Engineering Lab
>> Delft University of Technology
>> ________________________________________
>> From: Johan Peltenburg [[email protected]]
>> Sent: Tuesday, November 28, 2017 16:29
>> To: [email protected]
>> Subject: Development of an FPGA Accelerator framework around Apache Arrow
>>
>> Dear community,
>>
>> Over the last year we have been looking into integration of FPGA
>> accelerators
>> with big data frameworks such as Spark. Before Arrow took off, we
>> experienced
>> many issues like serialization overhead but also garbage collection issues,
>> as well as language interoperability issues with our low-level stack. These
>> are all problems that Arrow is now already solving for us in a very nice
>> manner.
>>
>> We see a growing amount of support for infrastructure providers such as
>> Amazon
>> that offer instances with FPGA resources already. Also, we see very rapid
>> advancements from the hardware technology side, where soon enough
>> accelerators can (cache-coherently) be attached to host memory (for
>> example in
>> OpenCAPI), allowing accelerators to work in the same virtual address
>> space as
>> the host process.
>>
>> We believe that a somewhat standardized format for data in-memory like
>> Arrow
>> can help us generalize big data processing in FPGAs tremendously. At the
>> same
>> time, it is known to us that FPGAs are notorious for their high
>> development time
>> and low programmability. Therefore, to alleviate some of these burdens
>> put upon
>> an accelerator developer, we are building a generalized framework around
>> Arrow
>> that abstracts away a very cumbersome aspect of FPGA design; interfacing
>> with
>> the data.
>>
>> The framework takes Arrow Schemas as input, and generates a layer that
>> on the
>> one side interfaces with whatever the host platform provides to access host
>> memory (our initial framework will target support for AXI and OpenCAPI),
>> and
>> on the other side will interface with the user kernel.
>>
>> The user can express request for access to the data in terms of row index
>> ranges. The generated layer will then provide data streams to the user,
>> which
>> the user may read using some kernel that they designed using high-level
>> synthesis (for example they could write the kernel in OpenCL). Thus,
>> they do
>> not need to go into the specifics of the Arrow in-memory format, bother
>> with
>> creating hardware constructs to deal with index buffers and validity
>> buffers,
>> interfacing with the host-side bus, implementing FIFO's, etc... anymore.
>> Hopefully this will be beneficial to faster deployment of FPGA accelerated
>> applications based on data represented in the Arrow format.
>>
>> Currently the framework supports schemas of primitive data types, (nested)
>> lists and structs. The major challenge here was to be able to generate
>> hardware
>> structures from the many forms of schemas that users may provide, but these
>> challenges have been solved. We are in the process of testing the
>> framework in
>> simulation, and will soon move to a test on real FPGA systems. With a
>> bit of luck
>> we hope to initially release our framework in January.
>>
>> We will fully open-source this framework and will attempt to make it as
>> vendor
>> independent as possible. Initially we hope to provide some example
>> applications
>> that demonstrate some of the benefits of using our framework in terms of
>> productivity and the benefits of using FPGAs for specific problems in big
>> data in general.
>>
>> We are reaching out for your comments, questions, suggestions, etc... Please
>> give us your thoughts about this. Thank you in advance.
>>
>> With kind regards,
>>
>> Johan Peltenburg
>> Computer Engineering Lab
>> Delft University of Technology

Re: [Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Reply via email to