Yes. TRecordStream's fundamtental use case is to be a robust file format for storing records (in our case thrift or ctrl delimited log data) and that they/it be self describing.
This means fixed sized frames that can be skipped over in case of corruption and providing transparent checksums and/or compression if needed. And a way to put the serializer/deserializer information in each header. And of course cross platform/languages - Java, Python, Perl and C++. It's actually not fully implemented yet :( -- pete On 9/4/08 11:49 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > I think there is a bit of ambiguity in what you said. > > I think what you mean is by "can be optionally compressed..." is that the > TRecordStream itself will do the compression if you ask, not that you can do > it for yourself. > > Correct? > > On Thu, Sep 4, 2008 at 11:46 AM, Pete Wyckoff <[EMAIL PROTECTED]> wrote: > >> >> I'll just give another plug for Thrift's TRecordStream which has fixed >> sized >> frames that can be optionally compressed or checksummed; since the frames >> are fixed sized, it can be split on frame boundaries. >> >> You can write whatever data you want with it - it doesn't have to be >> thrift, >> it just takes whatever is written and writes it to a FD or a socket or >> whatever. >> >> There is the issue of spill over between frames just like the sequence file >> case. >> >> -- pete >> >> >> On 9/4/08 11:32 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >> >>> On Thu, Sep 4, 2008 at 10:51 AM, Owen O'Malley <[EMAIL PROTECTED]> >> wrote: >>> >>>> ... >>>> It is also not splittable. It would be really nice to have a codec that >> was >>>> similar in compression/cpu cost to gzip that was splittable. >>>> >>> >>> Indeed. >>> >>> What happened to the effort to build a splittable gzip codec by inserting >>> dummy compression resets with a known pattern? >>> >> >> >
