Hi Mike, I think I'm still blocked on this or I'll have to move the splitting of the data up to the source which I know will work for sure. I've just been trying to avoid it because I didn't want to deploy this to all of the web servers.
I'm looking into the EventSerializer and I don't think it's going to work for me either. All of the examples I've seen so far write data to an output stream that seems to be the raw data file. It looks like append is only called once per event. This prevents me from writing multiple events as separate records in the squencefile on HDFS. https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSSequenceFile.java#L72 Am I off base here? J On Mon, Aug 13, 2012 at 8:59 PM, Mike Percy <[email protected]> wrote: > On Mon, Aug 13, 2012 at 3:34 PM, Jeremy Custenborder < > [email protected]> wrote: > >> I need to have the multiple objects available to >> hive. The upstream object is actually a protobuf with hierarchy. I was >> planning on flattening the object for hive. Here is an example of what >> I'm collecting. The actual protobuf has many more fields, but this >> gives you an idea. >> >> requestid >> page >> timestamp >> useragent >> impressions =[12345, 43212,12344,12345,43122, etc] >> >> transforming for each impression. >> >> requestid >> page >> timestamp >> useragent >> index >> objectid >> >> This gives me one row in hive per impression. This might be a little >> more contextual. I picked the earlier example because I didn't want to >> get caught up in my use case. I could move this code to serializers >> buy I need to do similar logic twice since I'm incrementing a counter >> in hbase per impression and adding a row per impression in hdfs(hive). >> If I transformed the event to multiple events earlier in the pipe. I >> would only have to write code to generate keys per event. At this >> point I'm going to implement two serializers. One to handle hdfs and >> one for hbase. >> > > Hi Jeremy, > > Thanks for the extra color. It's an interesting flow. As more people > continue to adopt Flume, I think we'll start to see patterns where the > design or implementation of Flume is lacking and we can work towards > bridging those gaps, and your use case provides valuable data on that. As > for where we are now, I'm happy to hear that you have found a way forward. > > If you can keep us apprised as things progress with your Flume deployment I > would love to hear about it! > > Regards, > Mike
