Loggin Large Events to S3

Guy Doulberg Sun, 16 Oct 2011 08:36:45 -0700

Hi fellow flummers,

I am struggling with flume for a couple of weeks, I am trying to logevents to Amazon S3 so later I could Use Amazon EMR to analyze the events.

The architecture I am trying to build is:

The client posts data bziped -> a end point decompresses the data andattach extra data (like http headers)-> writes the data to a local filesystem file -> flume agent tails that file -> send the events to a flumecollector -> the flume collector send the file to S3 bzipped

After some effort I made this architecture working for small events, theproblem is the events I should store are large (72kb expanded) and Ihave no control over the client (the client writes large zipped XMLfiles and I cann't change this behavior), so this architecture should beable to deal with this kind of events.

So I was thinking of two approaches, and I wanted share them with you,and to hear what you can say

1. Flume supports 32kb event size, but can support larger events bychanging the "flume.event.max.size.bytes" property, I tried to do that, but:

    a. I am afraid of the performance issue

b. It didn't work well, it seems like the events, it writes aretrimmed, and also it writes them infinitely.

2. Fluming the event bziped (not decompressing it on the endpoint) toS3, and decompressing it with the EMR later. In that case:

   a. What is the format I should store the events?
   b. How would I enrich the data with the request headers?


Thanks for time.



Guy Doulberg

Loggin Large Events to S3

Reply via email to