Yes. TRecordStream's fundamtental use case is to be a robust file format for
storing records (in our case thrift or ctrl delimited log data) and that
they/it be self describing.

This means fixed sized frames that can be skipped over in case of corruption
and providing transparent checksums and/or compression if needed.  And a way
to put the serializer/deserializer information in each header.

And of course cross platform/languages - Java, Python, Perl and C++.

It's actually not fully implemented yet :(

-- pete



On 9/4/08 11:49 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:

> I think there is a bit of ambiguity in what you said.
> 
> I think what you mean is by "can be optionally compressed..." is that the
> TRecordStream itself will do the compression if you ask, not that you can do
> it for yourself.
> 
> Correct?
> 
> On Thu, Sep 4, 2008 at 11:46 AM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> 
>> 
>> I'll just give another plug for Thrift's TRecordStream which has fixed
>> sized
>> frames that can be optionally compressed or checksummed; since the frames
>> are fixed sized, it can be split on frame boundaries.
>> 
>> You can write whatever data you want with it - it doesn't have to be
>> thrift,
>> it just takes whatever is written and writes it to a FD or a socket or
>> whatever.
>> 
>> There is the issue of spill over between frames just like the sequence file
>> case.
>> 
>> -- pete
>> 
>> 
>> On 9/4/08 11:32 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:
>> 
>>> On Thu, Sep 4, 2008 at 10:51 AM, Owen O'Malley <[EMAIL PROTECTED]>
>> wrote:
>>> 
>>>> ...
>>>> It is also not splittable. It would be really nice to have a codec that
>> was
>>>> similar in compression/cpu cost to gzip that was splittable.
>>>> 
>>> 
>>> Indeed.
>>> 
>>> What happened to the effort to build a splittable gzip codec by inserting
>>> dummy compression resets with a known pattern?
>>> 
>> 
>> 
> 

Reply via email to