You have code that puts records in bigger blocks on s3? Plz to share? :) Russell Jurney http://datasyndrome.com
On Mar 21, 2012, at 1:37 PM, Vaibhav Puranik <vpura...@gmail.com> wrote: > We also have s3 files organized by date in the following fashion. > > yyyy/MM/dd/hh > > Our messages are in JSON. > > Regards, > Vaibhav > > On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney > <russell.jur...@gmail.com>wrote: > >> I want the S3 files to be organized by type and date. Folders for types, >> subfolders for date down to the hour: year/month/day/hour. All payloads of >> a given type get written together. >> >> It would be ideal if there was no integration with the end format, but in >> practice I'm not sure if all the serialization protocols mentioned can be >> written in this way. >> >> Russell Jurney http://datasyndrome.com >> >> On Mar 21, 2012, at 12:50 PM, Tim Lossen <t...@lossen.de> wrote: >> >>> another good option would be messagepack -- flexible & schemaless like >> json, but binary. >>> >>> Sent from my iPhone >>> >>> On 21 Mar 2012, at 20:46, Russell Jurney <russell.jur...@gmail.com> >> wrote: >>> >>>> I'm going to use thrift, avro or protobuf for serialization. >>>> >>>> Russell Jurney http://datasyndrome.com >>>> >>>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vpura...@gmail.com> >> wrote: >>>> >>>>> I would use the payload. I want the message to be exactly as it is. We >> want >>>>> to name the files as per topic. >>>>> (That's how we differentiate right now). >>>>> >>>>> Regards, >>>>> Vaibhav >>>>> >>>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <niek.sand...@gmail.com >>> wrote: >>>>> >>>>>> So what would you like the S3 files to actually look like? >>>>>> >>>>>> One Kafka message body per line? Should the message topic be tossed >>>>>> in there too? >>>>>> >>>>>> A tricky aspect is that the Kafka message body is an opaque byte >>>>>> array. For my own case I'm using JSON for the payload so it makes my >>>>>> requirements simpler. >>>>>> >>>>>> - Niek >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney >>>>>> <russell.jur...@gmail.com> wrote: >>>>>>> I want events in S3 to process them in Hadoop. I'd like to emit them >> in >>>>>> my app, and have them magically show up in 64MB chunks on S3. Like >> most >>>>>> everyone else. >>>>>>> >>>>>>> Russell Jurney http://datasyndrome.com >>>>>>> >>>>>> >>