Re: [Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Uwe L. Korn Sun, 11 Feb 2018 03:42:32 -0800

Dear Johan,

this is an exciting use case for Arrow. Nice to hear about the benefits that 
Arrow brings to the world of FPGAs.


Greetings

Uwe

On Fri, Feb 9, 2018, at 10:11 PM, Johan Peltenburg - EWI wrote:
> Dear community,
> 
> In follow-up of the e-mail below, we have made public our repository 
> that contains our framework called Fletcher: A framework to integrate 
> FPGA accelerators with Apache Arrow.
> 
> https://github.com/johanpel/fletcher
> 
> With this framework you are able to provide an Arrow schema from which 
> an easy-to-use hardware interface for FPGAs is generated, reaping all 
> the benefits that Arrow already offers. On top of that it increases the 
> programmability of any acceleration project you'd want to build on top 
> of Arrow. During run-time, you simply pass your Arrow table to the run-
> time part of the framework and your hardware will be able to read from 
> it by using row index ranges, receiving streams of data in the form of 
> the type you've defined through the schema.
> 
> Currently there is an example project that does regular expression 
> matching on an Arrow table with strings, running on the Amazon EC2 F1 
> platform. We are not sponsored by Amazon, but as anyone can launch an 
> instance with an FPGA there, we thought it would be a good starting 
> point to hopefully gain some interest, even if you don't have an FPGA 
> card yourself.
> 
> FPGA accelerators can be so fast that more often than not serialization 
> kills a relatively large part of the performance. Our measurements in 
> this (relatively simple) example show that by using Arrow to prevent 
> serialization, we sometimes get up to 6X improvement in performance over 
> not using Arrow, especially if we start in languages that run on JVMs, 
> for example. (Thanks everyone!)
> 
> We are looking forward for people with a little bit of FPGA experience 
> to try it out and receive their thoughts, comments, etc. Please drop me 
> an e-mail.
> 
> With kind regards,
> 
> Johan Peltenburg
> Computer Engineering Lab
> Delft University of Technology
> ________________________________________
> From: Johan Peltenburg [[email protected]]
> Sent: Tuesday, November 28, 2017 16:29
> To: [email protected]
> Subject: Development of an FPGA Accelerator framework around Apache Arrow
> 
> Dear community,
> 
> Over the last year we have been looking into integration of FPGA
> accelerators
> with big data frameworks such as Spark. Before Arrow took off, we
> experienced
> many issues like serialization overhead but also garbage collection issues,
> as well as language interoperability issues with our low-level stack. These
> are all problems that Arrow is now already solving for us in a very nice
> manner.
> 
> We see a growing amount of support for infrastructure providers such as
> Amazon
> that offer instances with FPGA resources already. Also, we see very rapid
> advancements from the hardware technology side, where soon enough
> accelerators can (cache-coherently) be attached to host memory (for
> example in
> OpenCAPI), allowing accelerators to work in the same virtual address
> space as
> the host process.
> 
> We believe that a somewhat standardized format for data in-memory like
> Arrow
> can help us generalize big data processing in FPGAs tremendously. At the
> same
> time, it is known to us that FPGAs are notorious for their high
> development time
> and low programmability. Therefore, to alleviate some of these burdens
> put upon
> an accelerator developer, we are building a generalized framework around
> Arrow
> that abstracts away a very cumbersome aspect of FPGA design; interfacing
> with
> the data.
> 
> The framework takes Arrow Schemas as input, and generates a layer that
> on the
> one side interfaces with whatever the host platform provides to access host
> memory (our initial framework will target support for AXI and OpenCAPI),
> and
> on the other side will interface with the user kernel.
> 
> The user can express request for access to the data in terms of row index
> ranges. The generated layer will then provide data streams to the user,
> which
> the user may read using some kernel that they designed using high-level
> synthesis (for example they could write the kernel in OpenCL). Thus,
> they do
> not need to go into the specifics of the Arrow in-memory format, bother
> with
> creating hardware constructs to deal with index buffers and validity
> buffers,
> interfacing with the host-side bus, implementing FIFO's, etc... anymore.
> Hopefully this will be beneficial to faster deployment of FPGA accelerated
> applications based on data represented in the Arrow format.
> 
> Currently the framework supports schemas of primitive data types, (nested)
> lists and structs. The major challenge here was to be able to generate
> hardware
> structures from the many forms of schemas that users may provide, but these
> challenges have been solved. We are in the process of testing the
> framework in
> simulation, and will soon move to a test on real FPGA systems. With a
> bit of luck
> we hope to initially release our framework in January.
> 
> We will fully open-source this framework and will attempt to make it as
> vendor
> independent as possible. Initially we hope to provide some example
> applications
> that demonstrate some of the benefits of using our framework in terms of
> productivity and the benefits of using FPGAs for specific problems in big
> data in general.
> 
> We are reaching out for your comments, questions, suggestions, etc... Please
> give us your thoughts about this. Thank you in advance.
> 
> With kind regards,
> 
> Johan Peltenburg
> Computer Engineering Lab
> Delft University of Technology

Re: [Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Reply via email to